user@spark.apache.org, 2018-08

You are viewing a plain text version of this content. The canonical link for it is here.

- Clearing usercache on EMR [pyspark] - posted by Shuporno Choudhury <sh...@gmail.com> on 2018/08/01 07:19:47 UTC, 1 replies.
- How to make Yarn dynamically allocate resources for Spark - posted by Anton Puzanov <an...@gmail.com> on 2018/08/01 07:30:34 UTC, 1 replies.
- How to use window method with direct kafka streaming ? - posted by "fat.wei" <fa...@aliyun.com.INVALID> on 2018/08/01 09:17:02 UTC, 0 replies.
- Data quality measurement for streaming data with apache spark - posted by Uttam <ud...@gmail.com> on 2018/08/01 10:11:35 UTC, 0 replies.
- Re: How to add a new source to exsting struct streaming application, like a kafka source - posted by Robb Greathouse <rg...@redhat.com> on 2018/08/01 16:36:07 UTC, 1 replies.
- Re: Saving dataframes with partitionBy: append partitions, overwrite within each - posted by Nirav Patel <np...@xactlycorp.com> on 2018/08/01 19:11:21 UTC, 3 replies.
- Overwrite only specific partition with hive dynamic partitioning - posted by Nirav Patel <np...@xactlycorp.com> on 2018/08/01 19:24:25 UTC, 0 replies.
- RE: Split a row into multiple rows Java - posted by nookala <sr...@gmail.com> on 2018/08/01 20:05:12 UTC, 3 replies.
- Spark Memory Requirement - posted by msbreuer <ms...@gmail.com> on 2018/08/01 21:21:06 UTC, 0 replies.
- unsubscribe - posted by Eco Super <el...@gmail.com> on 2018/08/02 06:24:19 UTC, 2 replies.
- Can we deploy python script on a spark cluster - posted by Lehak Dharmani <le...@intellibridge.co> on 2018/08/02 12:46:28 UTC, 1 replies.
- re: streaming, batch / spark 2.2.1 - posted by Peter Liu <pe...@gmail.com> on 2018/08/02 18:42:22 UTC, 2 replies.
- Spark on Kubernetes: Kubernetes killing executors because of overallocation of memory - posted by Jayesh Lalwani <ja...@capitalone.com> on 2018/08/02 19:34:16 UTC, 1 replies.
- Re: [External Sender] re: streaming, batch / spark 2.2.1 - posted by Jayesh Lalwani <ja...@capitalone.com> on 2018/08/02 20:11:57 UTC, 1 replies.
- Insert into dynamic partitioned hive/parquet table throws error - Partition spec contains non-partition columns - posted by Nirav Patel <np...@xactlycorp.com> on 2018/08/03 00:01:42 UTC, 1 replies.
- Machine Learning with window data - posted by Christiaan Ras <ch...@semmelwise.nl> on 2018/08/03 10:01:27 UTC, 1 replies.
- How does readStream() and writeStream() work? - posted by dddaaa <da...@gmail.com> on 2018/08/03 12:19:32 UTC, 0 replies.
- Does row_number over a window cause a shuffle? - posted by Jayesh Lalwani <ja...@capitalone.com> on 2018/08/03 15:15:25 UTC, 0 replies.
- Replacing groupBykey() with reduceByKey() - posted by Bathi CCDB <ba...@gmail.com> on 2018/08/03 22:05:35 UTC, 3 replies.
- Broadcast variable size limit? - posted by klrmowse <kl...@gmail.com> on 2018/08/05 14:51:51 UTC, 3 replies.
- spark structured streaming with file based sources and sinks - posted by Koert Kuipers <ko...@tresata.com> on 2018/08/06 16:31:38 UTC, 0 replies.
- Re: Handle BlockMissingException in pyspark - posted by John Zhuge <jo...@gmail.com> on 2018/08/06 19:49:13 UTC, 0 replies.
- Driver OOM when using writing parquet - posted by Nikhil Goyal <no...@gmail.com> on 2018/08/06 23:59:20 UTC, 0 replies.
- need workaround around HIVE-11625 / DISTRO-800 - posted by Pranav Agrawal <pr...@gmail.com> on 2018/08/07 08:19:05 UTC, 1 replies.
- Dynamic partitioning weird behavior - posted by Nikolay Skovpin <ko...@gmail.com> on 2018/08/07 14:47:43 UTC, 0 replies.
- Newbie question on how to extract column value - posted by James Starks <su...@protonmail.com.INVALID> on 2018/08/07 15:09:11 UTC, 2 replies.
- Updating dynamic partitioned hive table throws error - Partition spec contains non-partition columns - posted by nirav <ni...@gmail.com> on 2018/08/07 18:00:18 UTC, 0 replies.
- Unable to see completed application in Spark 2 history web UI - posted by Fawze Abujaber <fa...@gmail.com> on 2018/08/08 04:56:41 UTC, 9 replies.
- Run/install tensorframes on zeppelin pyspark - posted by Spico Florin <sp...@gmail.com> on 2018/08/08 13:59:47 UTC, 2 replies.
- Data source jdbc does not support streamed reading - posted by James Starks <su...@protonmail.com.INVALID> on 2018/08/08 16:23:54 UTC, 0 replies.
- groupBy and then coalesce impacts shuffle partitions in unintended way - posted by Koert Kuipers <ko...@tresata.com> on 2018/08/08 19:39:04 UTC, 10 replies.
- Intellij run Spark unit test - posted by Daniel Zhang <ja...@hotmail.com> on 2018/08/09 00:35:44 UTC, 0 replies.
- [Structured Streaming] Understanding waterMark, flatMapGroupWithState and possibly windowing - posted by subramgr <su...@gmail.com> on 2018/08/09 02:35:13 UTC, 0 replies.
- Error in java_gateway.py - posted by shubham <sh...@gmail.com> on 2018/08/09 05:48:01 UTC, 1 replies.
- Understanding spark.executor.memoryOverhead - posted by Akash Mishra <ak...@gmail.com> on 2018/08/09 10:14:46 UTC, 0 replies.
- Structured Streaming doesn't write checkpoint log when I use coalesce - posted by WangXiaolong <ro...@163.com> on 2018/08/09 11:38:03 UTC, 1 replies.
- Plans for Session Windows? - posted by Mike Sukmanowsky <mi...@gmail.com> on 2018/08/09 15:02:54 UTC, 5 replies.
- Re: Implementing .zip file codec - posted by mytramesh <tu...@gmail.com> on 2018/08/09 16:36:34 UTC, 0 replies.
- Kryoserializer with pyspark - posted by Hichame El Khalfi <hi...@elkhalfi.com> on 2018/08/09 17:25:44 UTC, 0 replies.
- How does mapPartitions function work in Spark streaming on DStreams? - posted by zakhavan <za...@unm.edu> on 2018/08/09 17:27:24 UTC, 0 replies.
- [Structured Streaming] Two watermarks and StreamingQueryListener - posted by subramgr <su...@gmail.com> on 2018/08/09 22:15:27 UTC, 2 replies.
- MultilayerPerceptronClassifier - posted by Mina Aslani <as...@gmail.com> on 2018/08/10 03:16:31 UTC, 0 replies.
- Spark Sparser library - posted by umargeek <um...@gmail.com> on 2018/08/10 05:48:54 UTC, 1 replies.
- Using Logback.xml with Spark - posted by adithya kanumalla <ad...@gmail.com> on 2018/08/10 10:46:19 UTC, 0 replies.
- How to get MultilayerPerceptronClassifier model parameters? - posted by Mina Aslani <as...@gmail.com> on 2018/08/10 14:37:20 UTC, 0 replies.
- Why is the max iteration for svd not configurable in mllib? - posted by Sam Lendle <sl...@pandora.com> on 2018/08/10 18:15:41 UTC, 0 replies.
- How to parallelize zip file processing? - posted by mytramesh <tu...@gmail.com> on 2018/08/10 20:54:43 UTC, 2 replies.
- [Structured Streaming SPARK-23966] Why non-atomic rename is problem in State Store ? - posted by chandan prakash <ch...@gmail.com> on 2018/08/11 16:33:35 UTC, 0 replies.
- executing stored procedure through spark - posted by amit kumar singh <am...@gmail.com> on 2018/08/12 15:56:36 UTC, 1 replies.
- Accessing a dataframe from another Singleton class (Python) - posted by Aakash Basu <aa...@gmail.com> on 2018/08/13 06:47:32 UTC, 0 replies.
- CVE-2018-11770: Apache Spark standalone master, Mesos REST APIs not controlled by authentication - posted by Sean Owen <sr...@apache.org> on 2018/08/13 14:24:04 UTC, 0 replies.
- How to convert Spark Streaming to Static Dataframe on the fly and pass it to a ML Model as batch - posted by Aakash Basu <aa...@gmail.com> on 2018/08/14 07:31:13 UTC, 2 replies.
- Sending data from ZeroMQ to Spark Streaming API with Python - posted by oreogundipe <or...@gmail.com> on 2018/08/14 10:35:46 UTC, 0 replies.
- Custom state store provider based on RocksDB - posted by Alexander Chermenin <a....@gmail.com> on 2018/08/14 12:40:00 UTC, 0 replies.
- Spark CEP - posted by Esa Heikkinen <es...@student.tut.fi> on 2018/08/14 13:20:30 UTC, 0 replies.
- [SPARK-24771] Upgrade AVRO version from 1.7.7 to 1.8 - posted by Wenchen Fan <cl...@gmail.com> on 2018/08/15 02:29:33 UTC, 0 replies.
- spark driver pod stuck in Waiting: PodInitializing state in Kubernetes - posted by purna pradeep <pu...@gmail.com> on 2018/08/15 11:45:10 UTC, 2 replies.
- Java API for statistics of spark job running on yarn - posted by Serkan TAS <Se...@enerjisa.com> on 2018/08/15 12:00:17 UTC, 0 replies.
- from_json function - posted by dbolshak <bo...@gmail.com> on 2018/08/15 14:51:56 UTC, 2 replies.
- Shuffle uses Direct Memory Buffer even after setting "spark.shuffle.io.preferDirectBufs = false" - posted by Vaibhav Kulkarni <va...@gmail.com> on 2018/08/15 15:30:12 UTC, 0 replies.
- [K8S] Spark initContainer custom bootstrap support for Spark master - posted by Li Gao <li...@gmail.com> on 2018/08/15 16:11:49 UTC, 2 replies.
- Dynamic Allocation not removing executors - posted by Maximiliano Patricio Méndez <mm...@despegar.com> on 2018/08/15 19:38:47 UTC, 0 replies.
- from_json schema order - posted by Brandon Geise <br...@gmail.com> on 2018/08/15 22:36:54 UTC, 0 replies.
- java.lang.UnsupportedOperationException: No Encoder found for Set[String] - posted by V0lleyBallJunki3 <ve...@gmail.com> on 2018/08/16 01:59:52 UTC, 3 replies.
- JdbcRDD - schema always resolved as nullable=true - posted by Subhash Sriram <su...@gmail.com> on 2018/08/16 02:58:22 UTC, 0 replies.
- [ANNOUNCE] Apache Toree 0.2.0-incubating Released - posted by Luciano Resende <lr...@apache.org> on 2018/08/16 03:17:53 UTC, 0 replies.
- Re: Structured streaming: Tried to fetch $offset but the returned record offset was ${record.offset}" - posted by an...@gmail.com, an...@gmail.com on 2018/08/16 11:27:10 UTC, 0 replies.
- Pass config file through spark-submit - posted by James Starks <su...@protonmail.com.INVALID> on 2018/08/16 14:29:52 UTC, 2 replies.
- [Spark Streaming] [ML]: Exception handling for the transform method of Spark ML pipeline model - posted by sudododo <JI...@GMAIL.COM> on 2018/08/16 15:08:18 UTC, 1 replies.
- java.lang.IndexOutOfBoundsException: len is negative - when data size increases - posted by Deepak Sharma <de...@gmail.com> on 2018/08/16 15:25:02 UTC, 1 replies.
- something happened to MemoryStream after spark 2.3 - posted by Koert Kuipers <ko...@tresata.com> on 2018/08/16 20:52:26 UTC, 0 replies.
- Use Spark extension points to implement row-level security - posted by Richard Siebeling <rs...@gmail.com> on 2018/08/17 06:55:49 UTC, 2 replies.
- java.nio.file.FileSystemException: /tmp/spark- .._cache : No space left on device - posted by "Polisetti, Venkata Siva Rama Gopala Krishna" <vp...@spglobal.com> on 2018/08/17 13:20:24 UTC, 0 replies.
- Re: java.nio.file.FileSystemException: /tmp/spark- .._cache : No space left on device - posted by "Jeevan K. Srivatsa" <je...@gmail.com> on 2018/08/17 15:13:52 UTC, 1 replies.
- Two different Hive instances running - posted by Fabio Wada <fa...@servix.com.INVALID> on 2018/08/17 18:21:47 UTC, 2 replies.
- Pyspark error when converting string to timestamp in map function - posted by Keith Chapman <ke...@gmail.com> on 2018/08/17 23:50:34 UTC, 0 replies.
- csv reader performance with multiline option - posted by Nirav Patel <np...@xactlycorp.com> on 2018/08/18 16:07:41 UTC, 1 replies.
- Refresh broadcast variable when it isn't the value. - posted by Guillermo Ortiz Fernández <gu...@gmail.com> on 2018/08/19 20:41:41 UTC, 0 replies.
- Why repartitionAndSortWithinPartitions slower than MapReducer - posted by 周浥尘 <zh...@gmail.com> on 2018/08/20 12:52:57 UTC, 2 replies.
- No space left on device - posted by Steve Lewis <lo...@gmail.com> on 2018/08/20 16:08:33 UTC, 4 replies.
- Unsubscribe - posted by Happy每一天 <52...@qq.com> on 2018/08/21 03:39:05 UTC, 1 replies.
- Spark with Scala : understanding closures or best way to take udf registrations' code out of main and put in utils - posted by aastha <aa...@gmail.com> on 2018/08/21 06:14:20 UTC, 0 replies.
- Re: Structured Streaming on Kubernetes - posted by puneetloya <pu...@gmail.com> on 2018/08/21 18:33:05 UTC, 0 replies.
- Insert a pyspark dataframe in postgresql - posted by dimitris plakas <di...@gmail.com> on 2018/08/21 21:50:27 UTC, 0 replies.
- CBO not predicting cardinality on partition columns for Parquet tables - posted by rajat mishra <mi...@gmail.com> on 2018/08/22 04:15:12 UTC, 0 replies.
- : Failed to create file system watcher service: User limit of inotify instances reached or too many open files - posted by "Polisetti, Venkata Siva Rama Gopala Krishna" <vp...@spglobal.com> on 2018/08/22 08:24:55 UTC, 0 replies.
- How to merge multiple rows - posted by msbreuer <ms...@gmail.com> on 2018/08/22 20:04:24 UTC, 2 replies.
- About the question of Spark Structured Streaming window output - posted by "zrc@zjdex.com" <zr...@zjdex.com> on 2018/08/23 02:30:43 UTC, 0 replies.
- How to deal with context dependent computing? - posted by JF Chen <da...@gmail.com> on 2018/08/23 02:52:27 UTC, 3 replies.
- Caching small Rdd's take really long time and Spark seems frozen - posted by Guillermo Ortiz <ko...@gmail.com> on 2018/08/23 13:08:58 UTC, 4 replies.
- Slow Query Plan Generation - posted by "Rosbrook, Andrew J" <an...@jpmchase.com.INVALID> on 2018/08/24 10:37:29 UTC, 2 replies.
- CBO not working for Parquet Files - posted by rajat mishra <mi...@gmail.com> on 2018/08/24 15:11:04 UTC, 0 replies.
- Handling Very Large volume(500TB) data using spark - posted by Great Info <gu...@gmail.com> on 2018/08/25 02:54:13 UTC, 0 replies.
- Fw:multiple group by action - posted by 崔苗 <cu...@danale.com> on 2018/08/25 02:55:07 UTC, 1 replies.
- Spark Structured Streaming using S3 as data source - posted by sherif98 <sh...@gmail.com> on 2018/08/26 09:22:17 UTC, 2 replies.
- Re: About the question of Spark Structured Streaming window output - posted by Gerard Maas <ge...@gmail.com> on 2018/08/26 21:00:19 UTC, 5 replies.
- java.io.NotSerializableException: org.apache.spark.sql.TypedColumn - posted by zzcclp <44...@qq.com> on 2018/08/27 03:19:36 UTC, 0 replies.
- How to use 'insert overwrite [local] directory' correctly? - posted by Bang Xiao <ch...@gmail.com> on 2018/08/27 07:33:40 UTC, 3 replies.
- Pitfalls of partitioning by host? - posted by Patrick McCarthy <pm...@dstillery.com.INVALID> on 2018/08/27 17:22:41 UTC, 10 replies.
- How do I generate current UTC timestamp in raw spark sql? - posted by kant kodali <ka...@gmail.com> on 2018/08/28 09:03:50 UTC, 1 replies.
- RDD Collect Issue - posted by Aakash Basu <aa...@gmail.com> on 2018/08/28 12:08:34 UTC, 0 replies.
- Re: [External Sender] Pitfalls of partitioning by host? - posted by Jayesh Lalwani <ja...@capitalone.com> on 2018/08/28 17:22:18 UTC, 0 replies.
- Which Py4J version goes with Spark 2.3.1? - posted by Aakash Basu <aa...@gmail.com> on 2018/08/29 06:59:49 UTC, 1 replies.
- Spark Streaming - Kafka. java.lang.IllegalStateException: This consumer has already been closed. - posted by Guillermo Ortiz Fernández <gu...@gmail.com> on 2018/08/29 07:10:32 UTC, 3 replies.
- java.lang.OutOfMemoryError: Java heap space - Spark driver. - posted by Guillermo Ortiz Fernández <gu...@gmail.com> on 2018/08/29 08:38:49 UTC, 0 replies.
- Spark udf from external jar without enabling Hive - posted by Swapnil Chougule <th...@gmail.com> on 2018/08/29 10:42:23 UTC, 0 replies.
- Parallelism: behavioural difference in version 1.2 and 2.1!? - posted by "jeevan.ks" <je...@gmail.com> on 2018/08/29 13:55:43 UTC, 2 replies.
- Spark code to write to MySQL and Hive - posted by ry...@gmail.com on 2018/08/29 14:49:15 UTC, 4 replies.
- Is there a plan for official spark-avro/spark-orc read/write library using Data Source V2 - posted by yxchen <yx...@linkedin.com> on 2018/08/29 20:59:24 UTC, 0 replies.
- DStream reduceByKeyAndWindow not using checkpointed data for inverse reducing old data - posted by N B <nb...@gmail.com> on 2018/08/29 23:16:04 UTC, 0 replies.
- Long term arbitrary stateful processing - best practices - posted by monohusche <mo...@gmail.com> on 2018/08/30 02:37:12 UTC, 0 replies.
- Spark Structured Streaming checkpointing with S3 data source - posted by sherif98 <sh...@gmail.com> on 2018/08/30 11:32:24 UTC, 0 replies.
- Default Java Opts Standalone - posted by Evelyn Bayes <u5...@gmail.com> on 2018/08/30 11:42:13 UTC, 1 replies.
- spark structured streaming jobs working in HDP2.6 fail in HDP3.0 - posted by Lian Jiang <ji...@gmail.com> on 2018/08/30 15:59:12 UTC, 2 replies.
- ConcurrentModificationExceptions with CachedKafkaConsumers - posted by Bryan Jeffrey <br...@gmail.com> on 2018/08/30 17:27:15 UTC, 4 replies.
- Local mode vs client mode with one executor - posted by Guillermo Ortiz <ko...@gmail.com> on 2018/08/30 21:00:48 UTC, 0 replies.
- CSV parser - how to parse column containing json data - posted by Nirav Patel <np...@xactlycorp.com> on 2018/08/30 23:19:06 UTC, 1 replies.
- Structured Streaming : Custom Source and Sink Development and PySpark. - posted by "Ramaswamy, Muthuraman" <Mu...@viasat.com> on 2018/08/31 01:23:14 UTC, 1 replies.
- is spark TempView thread safe - posted by 崔苗 <cu...@danale.com> on 2018/08/31 06:50:14 UTC, 0 replies.
- Type change support in spark parquet read-write - posted by Swapnil Chougule <th...@gmail.com> on 2018/08/31 09:32:54 UTC, 0 replies.
- read snappy compressed files in spark - posted by Ricky <ri...@gmail.com> on 2018/08/31 10:35:42 UTC, 0 replies.