user@spark.apache.org, 2018-07

You are viewing a plain text version of this content. The canonical link for it is here.

- Re: Setting log level to DEBUG while keeping httpclient.wire on WARN - posted by "yujhe.li" <li...@gmail.com> on 2018/07/01 02:11:59 UTC, 0 replies.
- Re: Repartition not working on a csv file - posted by "yujhe.li" <li...@gmail.com> on 2018/07/01 02:28:31 UTC, 5 replies.
- Unable to acquire N bytes of memory, got 0 - posted by 吴晓菊 <ch...@gmail.com> on 2018/07/01 12:26:56 UTC, 0 replies.
- [ANNOUNCE] Apache Spark 2.1.3 - posted by Holden Karau <ho...@pigscanfly.ca> on 2018/07/01 19:01:35 UTC, 0 replies.
- Dataframe reader does not read microseconds, but TimestampType supports microseconds - posted by Colin Williams <co...@gmail.com> on 2018/07/02 07:03:17 UTC, 1 replies.
- union of multiple twitter streams [spark-streaming-twitter_2.11] - posted by Imran Rajjad <ra...@gmail.com> on 2018/07/02 10:44:00 UTC, 0 replies.
- Error while doing stream-stream inner join (java.util.ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access) - posted by kant kodali <ka...@gmail.com> on 2018/07/02 10:56:00 UTC, 3 replies.
- Question about Spark, Inner Join and Delegation to a Parquet Table - posted by Mike Buck <Mi...@pb.com> on 2018/07/02 13:23:22 UTC, 0 replies.
- [Structured Streaming] Metrics or logs of events that are ignored due to watermark - posted by subramgr <su...@gmail.com> on 2018/07/02 18:39:38 UTC, 2 replies.
- How to avoid duplicate column names after join with multiple conditions - posted by Nirav Patel <np...@xactlycorp.com> on 2018/07/02 21:52:04 UTC, 6 replies.
- Re: Spark Druid Ingestion - posted by gosoy <go...@live.cn> on 2018/07/03 02:56:45 UTC, 0 replies.
- [G1GC] -XX: -ResizePLAB How to provide in Spark Submit - posted by Aakash Basu <aa...@gmail.com> on 2018/07/03 07:00:57 UTC, 2 replies.
- Inferring Data driven Spark parameters - posted by Aakash Basu <aa...@gmail.com> on 2018/07/03 07:34:30 UTC, 6 replies.
- Run Python User Defined Functions / code in Spark with Scala Codebase - posted by Chetan Khatri <ch...@gmail.com> on 2018/07/03 11:58:46 UTC, 8 replies.
- Re: Building SparkML vectors from long data - posted by Patrick McCarthy <pm...@dstillery.com.INVALID> on 2018/07/03 16:34:39 UTC, 0 replies.
- Number of records per micro-batch in DStream vs Structured Streaming - posted by subramgr <su...@gmail.com> on 2018/07/03 17:55:44 UTC, 1 replies.
- Re: spark 2.3.1 with kafka spark-streaming-kafka-0-10 (java.lang.AbstractMethodError) - posted by "Shixiong(Ryan) Zhu" <sh...@databricks.com> on 2018/07/03 20:22:26 UTC, 1 replies.
- [Spark Streaming MEMORY_ONLY] Understanding Dataflow - posted by thomas lavocat <th...@univ-grenoble-alpes.fr> on 2018/07/04 08:26:53 UTC, 2 replies.
- Kill spark executor when spark runs specific stage - posted by Serega Sheypak <se...@gmail.com> on 2018/07/04 17:04:33 UTC, 0 replies.
- Re: How to branch a Stream / have multiple Sinks / do multiple Queries on one Stream - posted by chandan prakash <ch...@gmail.com> on 2018/07/05 06:00:20 UTC, 4 replies.
- structured streaming: how to keep counter of error records in log running streaming application - posted by chandan prakash <ch...@gmail.com> on 2018/07/05 06:19:20 UTC, 0 replies.
- Automatic Json Schema inference using Structured Streaming - posted by SRK <sw...@gmail.com> on 2018/07/05 07:08:46 UTC, 1 replies.
- Spark 2.3 Kubernetes error - posted by "Mamillapalli, Purna Pradeep" <Pu...@capitalone.com> on 2018/07/05 12:46:11 UTC, 0 replies.
- Spark 2.3 Kubernetes error - posted by purna pradeep <pu...@gmail.com> on 2018/07/05 12:49:33 UTC, 1 replies.
- Strange behavior of Spark Masters during rolling update - posted by bsikander <be...@gmail.com> on 2018/07/05 15:55:24 UTC, 1 replies.
- Fwd: BeakerX 1.0 released - posted by "spot@draves.org" <sp...@draves.org> on 2018/07/05 21:33:45 UTC, 0 replies.
- unsubscribe - posted by Peter <th...@yahoo.com.INVALID> on 2018/07/05 22:02:42 UTC, 1 replies.
- [SPARK on MESOS] Avoid re-fetching Spark binary - posted by Tien Dat <tp...@gmail.com> on 2018/07/06 08:00:17 UTC, 9 replies.
- Retry option and range resource configuration for Spark job on Mesos - posted by Tien Dat <tp...@gmail.com> on 2018/07/06 14:42:22 UTC, 1 replies.
- Unable to see the table created using saveAsTable From Beeline. Please help! - posted by anna stax <an...@gmail.com> on 2018/07/06 23:10:29 UTC, 2 replies.
- Structured streaming - posted by amin MH <am...@yahoo.com.INVALID> on 2018/07/07 14:20:35 UTC, 0 replies.
- spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode. - posted by kant kodali <ka...@gmail.com> on 2018/07/08 13:58:19 UTC, 9 replies.
- Re: Create an Empty dataframe - posted by Shmuel Blitz <sh...@similarweb.com> on 2018/07/08 14:43:56 UTC, 1 replies.
- repartition - posted by ry...@gmail.com on 2018/07/08 16:26:52 UTC, 1 replies.
- Register now for ApacheCon and save $250 - posted by Rich Bowen <rb...@apache.org> on 2018/07/09 14:31:16 UTC, 0 replies.
- [REST API] Rest API unusable due to application id changing - posted by bsikander <be...@gmail.com> on 2018/07/09 16:05:33 UTC, 0 replies.
- Spark on Mesos - Weird behavior - posted by Thodoris Zois <zo...@ics.forth.gr> on 2018/07/09 18:04:49 UTC, 0 replies.
- Dataframe joins - UnsupportedOperationException: Unimplemented type: IntegerType - posted by Nirav Patel <np...@xactlycorp.com> on 2018/07/09 18:53:30 UTC, 1 replies.
- Dynamic allocation not releasing executors after unpersisting all cached data - posted by Jeffrey Charles <je...@vidyard.com> on 2018/07/09 18:59:04 UTC, 2 replies.
- Kubernetes security context when submitting job through k8s servers - posted by trung kien <ki...@gmail.com> on 2018/07/09 21:05:46 UTC, 3 replies.
- Pyspark access to scala/java libraries - posted by Mohit Jaggi <mo...@gmail.com> on 2018/07/09 22:45:45 UTC, 4 replies.
- [Structured Streaming] Reading Checkpoint data - posted by subramgr <su...@gmail.com> on 2018/07/09 23:07:52 UTC, 2 replies.
- [Structured Streaming] User Define Aggregation Function - posted by subramgr <su...@gmail.com> on 2018/07/10 01:22:08 UTC, 0 replies.
- [Structured Streaming] Last processed event time always behind Direct Streaming - posted by subramgr <su...@gmail.com> on 2018/07/10 02:34:35 UTC, 0 replies.
- Run STA/LTA python function using spark streaming: java.lang.IllegalArgumentException: requirement failed: No output operations registered, so nothing to execute - posted by zakhavan <za...@unm.edu> on 2018/07/10 06:05:29 UTC, 0 replies.
- Re: Emit Custom metrics in Spark Structured Streaming job - posted by chandan prakash <ch...@gmail.com> on 2018/07/10 07:25:16 UTC, 1 replies.
- Re: Spark on Mesos - Weird behavior - posted by Pavel Plotnikov <pa...@team.wrike.com> on 2018/07/10 11:35:21 UTC, 6 replies.
- Unpivoting - posted by amin mohebbi <am...@yahoo.com.INVALID> on 2018/07/10 12:26:59 UTC, 0 replies.
- [ANNOUNCE] Apache Spark 2.2.2 - posted by Tom Graves <tg...@yahoo.com.INVALID> on 2018/07/10 13:10:12 UTC, 0 replies.
- [Structured Streaming] Custom StateStoreProvider - posted by subramgr <su...@gmail.com> on 2018/07/10 14:06:50 UTC, 4 replies.
- How Kryo serializer allocates buffer in Spark - posted by nirav <ni...@gmail.com> on 2018/07/10 17:41:51 UTC, 0 replies.
- [Spark MLib]: RDD caching behavior of KMeans - posted by mkhan37 <mu...@gmail.com> on 2018/07/10 18:21:39 UTC, 0 replies.
- Convert scientific notation DecimalType - posted by dimitris plakas <di...@gmail.com> on 2018/07/10 18:29:00 UTC, 0 replies.
- Re: Unable to alter partition. The transaction for alter partition did not commit successfully. - posted by Arun Hive <ar...@yahoo.com.INVALID> on 2018/07/10 21:31:55 UTC, 0 replies.
- [Structured Streaming] Fine tuning GC performance - posted by subramgr <su...@gmail.com> on 2018/07/10 22:42:23 UTC, 0 replies.
- DataTypes of an ArrayType - posted by dimitris plakas <di...@gmail.com> on 2018/07/11 10:37:12 UTC, 1 replies.
- Spark accessing fakes3 - posted by Patrick Roemer <qu...@gmail.com> on 2018/07/11 16:27:38 UTC, 0 replies.
- CVE-2018-1334 Apache Spark local privilege escalation vulnerability - posted by Sean Owen <sr...@apache.org> on 2018/07/11 20:16:43 UTC, 0 replies.
- CVE-2018-8024 Apache Spark XSS vulnerability in UI - posted by Sean Owen <sr...@apache.org> on 2018/07/11 20:17:18 UTC, 0 replies.
- Dataframe multiple joins with same dataframe not able to resolve correct join columns - posted by Nirav Patel <np...@xactlycorp.com> on 2018/07/11 21:52:30 UTC, 1 replies.
- How to register custom structured streaming source - posted by Farshid Zavareh <fh...@gmail.com> on 2018/07/12 07:51:44 UTC, 1 replies.
- Re: [Structured Streaming] Avoiding multiple streaming queries - posted by chandan prakash <ch...@gmail.com> on 2018/07/12 09:38:44 UTC, 9 replies.
- Unable to infer schema pf Parquet in Spark 2.0.0 - posted by Priya Ch <le...@gmail.com> on 2018/07/12 10:19:44 UTC, 1 replies.
- how to specify external jars in program with SparkConf - posted by mytramesh <tu...@gmail.com> on 2018/07/12 14:49:27 UTC, 1 replies.
- streaming from mongo - posted by Chethan <ch...@gmail.com> on 2018/07/12 15:16:27 UTC, 0 replies.
- Running Spark on Kubernetes behind a HTTP proxy - posted by "Lalwani, Jayesh" <Ja...@capitalone.com> on 2018/07/12 16:32:38 UTC, 0 replies.
- Re: Interest in adding ability to request GPU's to the spark client? - posted by Maximiliano Felice <ma...@gmail.com> on 2018/07/12 18:13:53 UTC, 2 replies.
- Re: Spark ML online serving - posted by Maximiliano Felice <ma...@gmail.com> on 2018/07/12 18:40:22 UTC, 0 replies.
- Pyspark Structured Streaming Error - posted by umargeek <um...@gmail.com> on 2018/07/12 18:53:55 UTC, 1 replies.
- Upgrading spark history server, no logs showing. - posted by bbarks <br...@pm.me> on 2018/07/12 19:00:30 UTC, 0 replies.
- Re: How to validate orc vectorization is working within spark application? - posted by umargeek <um...@gmail.com> on 2018/07/12 19:14:58 UTC, 0 replies.
- Reading multiple files in Spark / which pattern to use - posted by Marco Mistroni <mm...@gmail.com> on 2018/07/12 21:05:43 UTC, 0 replies.
- spark sql data skew - posted by 崔苗 <cu...@danale.com> on 2018/07/13 10:20:06 UTC, 5 replies.
- spark rename or access columns which has special chars " ?: - posted by Great Info <gu...@gmail.com> on 2018/07/13 12:30:02 UTC, 0 replies.
- [ML] Linear regression with SGD - posted by sandy <ac...@turing.ac.uk> on 2018/07/13 16:12:34 UTC, 0 replies.
- Re: Live Streamed Code Review today at 11am Pacific - posted by Holden Karau <ho...@pigscanfly.ca> on 2018/07/13 19:03:45 UTC, 1 replies.
- Spark on Mesos: Spark issuing hundreds of SUBSCRIBE requests / second and crashing Mesos - posted by Nimi W <ps...@gmail.com> on 2018/07/13 22:39:22 UTC, 2 replies.
- Dataset - withColumn and withColumnRenamed that accept Column type - posted by Nirav Patel <np...@xactlycorp.com> on 2018/07/14 00:06:33 UTC, 3 replies.
- Spark Shortcut - posted by Deepu Raj <de...@outlook.com> on 2018/07/14 09:15:32 UTC, 0 replies.
- Re: Do GraphFrames support streaming? - posted by kant kodali <ka...@gmail.com> on 2018/07/14 13:59:46 UTC, 4 replies.
- how to decide broadcast join data size - posted by Selvam Raman <se...@gmail.com> on 2018/07/14 21:57:22 UTC, 0 replies.
- Can I specify watermark using raw sql alone? - posted by kant kodali <ka...@gmail.com> on 2018/07/14 23:19:55 UTC, 1 replies.
- How to stop streaming jobs - posted by Dhaval Modi <dh...@gmail.com> on 2018/07/15 09:54:06 UTC, 0 replies.
- Re: Stopping StreamingContext - posted by Dhaval Modi <dh...@gmail.com> on 2018/07/15 09:59:34 UTC, 0 replies.
- Re: Properly stop applications or jobs within the application - posted by Dhaval Modi <dh...@gmail.com> on 2018/07/15 10:01:02 UTC, 0 replies.
- Re: Stopping a Spark Streaming Context gracefully - posted by Dhaval Modi <dh...@gmail.com> on 2018/07/15 10:03:30 UTC, 0 replies.
- Security in pyspark using extensions - posted by Maximiliano Patricio Méndez <mm...@despegar.com> on 2018/07/15 23:16:35 UTC, 0 replies.
- Heap Memory in Spark 2.3.0 - posted by Bryan Jeffrey <br...@gmail.com> on 2018/07/16 18:43:26 UTC, 1 replies.
- Running Production ML Pipelines - posted by Gautam Singaraju <ga...@gmail.com> on 2018/07/16 21:11:25 UTC, 1 replies.
- Performance considerations, Using microservices for ZooKeeper & Kafka in Spark Streaming - posted by Mich Talebzadeh <mi...@gmail.com> on 2018/07/16 22:46:45 UTC, 0 replies.
- Query on Profiling Spark Code - posted by Aakash Basu <aa...@gmail.com> on 2018/07/17 07:10:42 UTC, 2 replies.
- Spark streaming connecting to two kafka clusters - posted by Sathi Chowdhury <sa...@yahoo.com.INVALID> on 2018/07/17 18:45:25 UTC, 0 replies.
- joining streams from multiple kafka clusters - posted by sathich <sa...@yahoo.com> on 2018/07/17 19:08:09 UTC, 0 replies.
- Dataframe from partitioned parquet table missing partition columns from schema - posted by Nirav Patel <np...@xactlycorp.com> on 2018/07/17 22:48:55 UTC, 1 replies.
- Spark (Scala) Streaming [Convert rdd [org.bson.document] - > dataframe] - posted by Chethan <ch...@gmail.com> on 2018/07/18 18:23:00 UTC, 1 replies.
- Re: [STRUCTURED STREAM] Join static dataset in state function (flatMapGroupsWithState) - posted by Gerard Maas <ge...@gmail.com> on 2018/07/19 13:19:42 UTC, 1 replies.
- [Spark Structured Streaming on K8S]: Debug - File handles/descriptor (unix pipe) leaking - posted by Abhishek Tripathi <ak...@gmail.com> on 2018/07/19 14:02:48 UTC, 2 replies.
- Arrow type issue with Pandas UDF - posted by Patrick McCarthy <pm...@dstillery.com.INVALID> on 2018/07/19 14:07:31 UTC, 3 replies.
- Compute /Storage Calculation - posted by Deepu Raj <de...@outlook.com> on 2018/07/19 17:39:45 UTC, 0 replies.
- Mulitple joins with same Dataframe throws Ambiguous reference error - posted by Nirav Patel <np...@xactlycorp.com> on 2018/07/19 21:11:29 UTC, 0 replies.
- Re: Mulitple joins with same Dataframe throws AnalysisException: resolved attribute(s) - posted by Nirav Patel <np...@xactlycorp.com> on 2018/07/19 21:13:16 UTC, 2 replies.
- Parquet - posted by amin mohebbi <am...@yahoo.com.INVALID> on 2018/07/20 01:34:40 UTC, 1 replies.
- Query on Spark Hive with kerberos Enabled on Kubernetes - posted by "Garlapati, Suryanarayana (Nokia - IN/Bangalore)" <su...@nokia.com> on 2018/07/20 15:06:16 UTC, 2 replies.
- [SPARK-SQL] Reading JSON column as a DataFrame and keeping partitioning information - posted by Daniel Mateus Pires <dm...@gmail.com> on 2018/07/20 16:56:33 UTC, 0 replies.
- Filtering DataType column with Timestamp - posted by fmilano <fm...@gmail.com> on 2018/07/20 17:56:35 UTC, 0 replies.
- Spark-Streaming-Kafka10_2.11 on Spark 2.3 - posted by Bryan Jeffrey <br...@gmail.com> on 2018/07/20 18:55:41 UTC, 3 replies.
- Apache Spark Cluster - posted by Uğur Sopaoğlu <us...@gmail.com> on 2018/07/23 12:21:57 UTC, 0 replies.
- Where can I read the Kafka offsets in SparkSQL application - posted by "John, Vishal (Agoda)" <Vi...@agoda.com.INVALID> on 2018/07/24 11:30:28 UTC, 1 replies.
- How to read json data from kafka and store to hdfs with spark structued streaming? - posted by dddaaa <da...@gmail.com> on 2018/07/24 22:38:47 UTC, 5 replies.
- Live Code Reviews, Coding, and Dev Tools - posted by Holden Karau <ho...@pigscanfly.ca> on 2018/07/25 03:54:11 UTC, 0 replies.
- ***UNCHECKED*** How dose spark streaming program （written with scala）call python file - posted by 康逸之 <ky...@icloud.com.INVALID> on 2018/07/25 07:34:24 UTC, 0 replies.
- How dose spark streaming program call python file - posted by 康逸之 <ky...@icloud.com.INVALID> on 2018/07/25 07:38:29 UTC, 0 replies.
- ***UNCHECKED*** UNSUBSCRIBE - posted by sridhararao mutluri <dr...@hotmail.com> on 2018/07/25 07:40:42 UTC, 0 replies.
- Bug in Window Function - posted by Elior Malul <el...@gmail.com> on 2018/07/25 08:04:10 UTC, 1 replies.
- Backpressure initial rate not working - posted by Biplob Biswas <re...@gmail.com> on 2018/07/25 14:23:36 UTC, 5 replies.
- Use Arrow instead of Pickle without pandas_udf - posted by Hichame El Khalfi <hi...@elkhalfi.com> on 2018/07/25 20:27:56 UTC, 4 replies.
- Split a row into multiple rows Java - posted by nookala <sr...@gmail.com> on 2018/07/26 02:03:07 UTC, 1 replies.
- Exceptions with simplest Structured Streaming example - posted by Jonathan Apple <jo...@qio.io> on 2018/07/26 12:41:16 UTC, 2 replies.
- Optimizing a join with bucketing - posted by Vitaliy Pisarev <vi...@biocatch.com> on 2018/07/26 15:36:58 UTC, 0 replies.
- Question of spark streaming - posted by utkarsh rathor <uu...@gmail.com> on 2018/07/27 12:14:53 UTC, 1 replies.
- Iterative rdd union + reduceByKey operations on small dataset leads to "No space left on device" error on account of lot of shuffle spill. - posted by dineshdharme <di...@gmail.com> on 2018/07/27 12:52:34 UTC, 2 replies.
- How to Create one DB connection per executor and close it after the job is done? - posted by kant kodali <ka...@gmail.com> on 2018/07/28 06:33:37 UTC, 3 replies.
- modeling timestamp in Avro messages (read using Spark Structured Streaming) - posted by karan alang <ka...@gmail.com> on 2018/07/29 16:53:59 UTC, 0 replies.
- [SparkContext] will application immediately stop after sc.stop()? - posted by bsikander <be...@gmail.com> on 2018/07/29 18:56:38 UTC, 0 replies.
- Big Burst of Streaming Changes - posted by ayan guha <gu...@gmail.com> on 2018/07/29 23:54:19 UTC, 0 replies.
- How to read csv in dataframe - posted by Lehak Dharmani <le...@intellibridge.co> on 2018/07/30 08:56:16 UTC, 0 replies.
- How to add a new source to exsting struct streaming application, like a kafka source - posted by 杨浩 <ya...@gmail.com> on 2018/07/30 09:13:11 UTC, 0 replies.
- Using Spark Streaming for analyzing changing data - posted by oripwk <or...@gmail.com> on 2018/07/30 11:48:27 UTC, 0 replies.
- Re: How to reduceByKeyAndWindow in Structured Streaming? - posted by oripwk <or...@gmail.com> on 2018/07/30 11:49:11 UTC, 0 replies.
- Kafka backlog - spark structured streaming - posted by Kailash Kalahasti <ka...@gmail.com> on 2018/07/30 15:03:51 UTC, 2 replies.
- sorting on dataframe causes out of memory (java heap space) - posted by msbreuer <ms...@gmail.com> on 2018/07/30 16:44:32 UTC, 0 replies.
- Executor lost for unknown reasons error Spark 2.3 on kubernetes - posted by "Mamillapalli, Purna Pradeep" <Pu...@capitalone.com> on 2018/07/30 20:47:41 UTC, 3 replies.
- How to do PCA with Spark Streaming Dataframe? - posted by Aakash Basu <aa...@gmail.com> on 2018/07/31 09:48:49 UTC, 1 replies.
- not able to make Yarn dynamically allocate resources for Spark - posted by Anton Puzanov <an...@gmail.com> on 2018/07/31 12:36:45 UTC, 0 replies.