You are viewing a plain text version of this content. The canonical link for it is here.
- Re: mapGroupsWithState in Python - posted by ayan guha <gu...@gmail.com> on 2018/02/01 00:14:53 UTC, 0 replies.
- Re: Max number of streams supported ? - posted by Yogesh Mahajan <ym...@snappydata.io> on 2018/02/01 00:46:00 UTC, 0 replies.
- Re: Spark Structured Streaming for Twitter Streaming data - posted by Tathagata Das <ta...@gmail.com> on 2018/02/01 02:30:15 UTC, 3 replies.
- FOSDEM mini-office hour? - posted by Holden Karau <ho...@pigscanfly.ca> on 2018/02/01 03:27:02 UTC, 0 replies.
- Re: ML:One vs Rest with crossValidator for multinomial in logistic regression - posted by Nicolas Paris <ni...@gmail.com> on 2018/02/01 06:20:48 UTC, 2 replies.
- [Error :] RDD TO Dataframe Spark Streaming - posted by Divya Gehlot <di...@gmail.com> on 2018/02/01 07:26:48 UTC, 0 replies.
- is there a way to create new column with timeuuid using raw spark sql ? - posted by kant kodali <ka...@gmail.com> on 2018/02/01 11:50:06 UTC, 4 replies.
- Re: [Structured Streaming] Reuse computation result - posted by Sandip Mehta <sa...@gmail.com> on 2018/02/01 12:06:34 UTC, 0 replies.
- Re: Apache Spark - Exception on adding column to Structured Streaming DataFrame - posted by M Singh <ma...@yahoo.com.INVALID> on 2018/02/01 15:43:26 UTC, 1 replies.
- unsubscribe - posted by James Casiraghi <jc...@algebraixdata.com> on 2018/02/01 15:47:23 UTC, 4 replies.
- Spark JDBC bulk insert - posted by Subhash Sriram <su...@gmail.com> on 2018/02/01 16:49:13 UTC, 0 replies.
- does Kinesis Connector for structured streaming auto-scales receivers if a cluster is using dynamic allocation and auto-scaling? - posted by "Mikhailau, Alex" <Al...@mlb.com> on 2018/02/01 16:54:59 UTC, 0 replies.
- Structure streaming to hive with kafka 0.9 - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2018/02/01 17:26:45 UTC, 0 replies.
- Re: Prefer Structured Streaming over Spark Streaming (DStreams)? - posted by Biplob Biswas <re...@gmail.com> on 2018/02/02 09:15:19 UTC, 0 replies.
- Kryo serialization failed: Buffer overflow : Broadcast Join - posted by Pralabh Kumar <pr...@gmail.com> on 2018/02/02 11:38:19 UTC, 1 replies.
- Re: spark 2.2.1 - posted by Bill Schwanitz <bi...@bilsch.org> on 2018/02/02 13:23:13 UTC, 0 replies.
- [Spark Core] Limit the task duration (and kill it!) - posted by Thomas Decaux <eb...@gmail.com> on 2018/02/02 16:45:46 UTC, 0 replies.
- Re: Apache Spark - Spark Structured Streaming - Watermark usage - posted by M Singh <ma...@yahoo.com.INVALID> on 2018/02/02 16:47:54 UTC, 3 replies.
- Running Spark 2.2.1 with extra packages - posted by Conconscious <co...@gmail.com> on 2018/02/02 19:43:35 UTC, 0 replies.
- can we expect UUID type in Spark 2.3? - posted by kant kodali <ka...@gmail.com> on 2018/02/02 21:14:04 UTC, 0 replies.
- Workarounds for OOM during serialization - posted by "J. McConnell" <j...@ubermenschconsulting.com> on 2018/02/02 21:48:40 UTC, 0 replies.
- Re: Schema - DataTypes.NullType - posted by Jean Georges Perrin <jg...@jgp.net> on 2018/02/04 18:15:19 UTC, 3 replies.
- There is no UDF0 interface? - posted by kant kodali <ka...@gmail.com> on 2018/02/04 20:23:29 UTC, 2 replies.
- high TFIDF value terms - posted by Donni Khan <pr...@googlemail.com> on 2018/02/05 11:51:58 UTC, 0 replies.
- Spark Streaming withWatermark - posted by Jiewen Shao <fi...@gmail.com> on 2018/02/06 18:11:32 UTC, 5 replies.
- Sharing spark executor pool across multiple long running spark applications - posted by Nirav Patel <np...@xactlycorp.com> on 2018/02/06 20:00:16 UTC, 2 replies.
- New to spark 2.2.1 - Problem with finding tables between different metastore db - posted by Subhajit Purkayastha <sp...@p3si.net> on 2018/02/07 05:46:01 UTC, 0 replies.
- Spark CEP with files and no streams ? - posted by Esa Heikkinen <es...@student.tut.fi> on 2018/02/07 08:52:06 UTC, 0 replies.
- How to preserve the order of parquet files? - posted by Kevin Jung <it...@samsung.com> on 2018/02/07 12:07:32 UTC, 0 replies.
- Issue with EFS checkpoint - posted by "Khan, Obaidur Rehman" <Ob...@capitalone.com> on 2018/02/07 16:08:49 UTC, 0 replies.
- [CFP] DataWorks Summit, San Jose, 2018 - posted by Yanbo Liang <yb...@gmail.com> on 2018/02/08 00:06:45 UTC, 0 replies.
- Are there any alternatives to Hive "stored by" clause as Spark 2.0 does not support it - posted by Pralabh Kumar <pr...@gmail.com> on 2018/02/08 06:25:41 UTC, 2 replies.
- Spark conf forgets cassandra host in the configuration file - posted by Ismail Bayraktar <mr...@gmail.com> on 2018/02/08 09:36:56 UTC, 0 replies.
- Unsubscribe - posted by Yosef Moatti <MO...@il.ibm.com> on 2018/02/08 18:34:17 UTC, 15 replies.
- Free access to Index Conf for Apache Spark community attendees - posted by xwu0226 <xi...@us.ibm.com> on 2018/02/08 19:47:02 UTC, 0 replies.
- Apache Spark - Structured Streaming - Updating UDF state dynamically at run time - posted by M Singh <ma...@yahoo.com.INVALID> on 2018/02/08 22:58:22 UTC, 1 replies.
- H2O ML use - posted by Mich Talebzadeh <mi...@gmail.com> on 2018/02/09 09:24:15 UTC, 0 replies.
- Spark Dataframe and HIVE - posted by "☼ R Nair (रविशंकर नायर)" <ra...@gmail.com> on 2018/02/09 14:49:51 UTC, 24 replies.
- PySpark Tweedie GLM - posted by nhamwey <ni...@thehartford.com> on 2018/02/09 17:42:11 UTC, 1 replies.
- NullPointerException issue in LDA.train() - posted by Kevin Lam <ke...@fathomhealth.co> on 2018/02/09 20:02:43 UTC, 0 replies.
- Spark TreeAggregate Slow LogisticRegressionWithSGD - posted by Andy Zhang <an...@berkeley.edu> on 2018/02/09 22:05:39 UTC, 0 replies.
- [Structured Streaming] Commit protocol to move temp files to dest path only when complete, with code - posted by Dave Cameron <dc...@digitalocean.com.INVALID> on 2018/02/09 22:53:15 UTC, 1 replies.
- [Structured Streaming] Deserializing avro messages from kafka source using schema registry - posted by Bram <th...@gmail.com> on 2018/02/09 23:07:29 UTC, 1 replies.
- can udaf's return complex types? - posted by kant kodali <ka...@gmail.com> on 2018/02/10 13:28:50 UTC, 1 replies.
- Log analysis with GraphX - posted by Philippe de Rochambeau <ph...@free.fr> on 2018/02/10 14:49:00 UTC, 4 replies.
- optimize hive query to move a subset of data from one partition table to another table - posted by amit kumar singh <am...@gmail.com> on 2018/02/10 15:18:57 UTC, 3 replies.
- Apache Spark - Structured Streaming Query Status - field descriptions - posted by M Singh <ma...@yahoo.com.INVALID> on 2018/02/11 01:42:26 UTC, 2 replies.
- Spark cannot find tables in Oracle database - posted by Lian Jiang <ji...@gmail.com> on 2018/02/11 02:26:38 UTC, 4 replies.
- saveAsTable does not respect spark.sql.warehouse.dir - posted by Lian Jiang <ji...@gmail.com> on 2018/02/11 17:15:32 UTC, 2 replies.
- [pyspark] structured streaming deployment & monitoring recommendation - posted by Bram <th...@gmail.com> on 2018/02/12 12:04:03 UTC, 0 replies.
- Efficient way to compare the current row with previous row contents - posted by Debabrata Ghosh <ma...@gmail.com> on 2018/02/12 12:10:13 UTC, 4 replies.
- Spark sortByKey is not lazy evaluated - posted by sandudi <ph...@gmail.com> on 2018/02/12 12:38:55 UTC, 0 replies.
- org.apache.kafka.clients.consumer.OffsetOutOfRangeException - posted by Mina Aslani <as...@gmail.com> on 2018/02/12 19:04:52 UTC, 1 replies.
- Spark on K8s with Romana - posted by Jenna Hoole <je...@gmail.com> on 2018/02/12 21:21:58 UTC, 1 replies.
- [Structured Streaming] Avoiding multiple streaming queries - posted by Priyank Shrivastava <pr...@asperasoft.com> on 2018/02/13 01:54:13 UTC, 3 replies.
- [Spark-Listener] [How-to] Listen only to specific events - posted by Naved Alam <al...@outlook.com> on 2018/02/13 08:15:51 UTC, 0 replies.
- Run Multiple Spark jobs. Reduce Execution time. - posted by akshay naidu <ak...@gmail.com> on 2018/02/13 11:13:56 UTC, 4 replies.
- Retrieve batch metadata via the spark monitoring api - posted by Hendrik Dev <he...@gmail.com> on 2018/02/13 14:20:37 UTC, 0 replies.
- Spark 2.2.1 EMR 5.11.1 Encrypted S3 bucket overwriting parquet file - posted by Stephen Robinson <St...@aquilainsight.com> on 2018/02/13 15:04:42 UTC, 0 replies.
- Why python cluster mode is not supported in standalone cluster? - posted by Ashwin Sai Shankar <as...@netflix.com.INVALID> on 2018/02/13 20:20:33 UTC, 1 replies.
- Inefficient state management in stream to stream join in 2.3 - posted by Yogesh Mahajan <ym...@snappydata.io> on 2018/02/13 21:10:30 UTC, 0 replies.
- [Spark GraphX pregel] default value for EdgeDirection not consistent between programming guide and API documentation - posted by Ramon Bejar Torres <ra...@diei.udl.cat> on 2018/02/13 23:36:10 UTC, 0 replies.
- not able to read git info from Scala Test Suite - posted by karan alang <ka...@gmail.com> on 2018/02/14 02:35:43 UTC, 0 replies.
- read parallel processing spark-cassandra - posted by sujeet jog <su...@gmail.com> on 2018/02/14 03:47:48 UTC, 0 replies.
- SparkR test script issue: unable to run run-tests.h on spark 2.2 - posted by chandan prakash <ch...@gmail.com> on 2018/02/14 09:09:23 UTC, 3 replies.
- Spark structured streaming: periodically refresh static data frame - posted by Appu K <ku...@gmail.com> on 2018/02/14 09:24:48 UTC, 5 replies.
- [Spark-Core]port opened by the SparkDriver is vulnerable to flooding attacks - posted by sandeep-katta <sa...@gmail.com> on 2018/02/14 16:01:46 UTC, 0 replies.
- stdout: org.apache.spark.sql.AnalysisException: nondeterministic expressions are only allowed in - posted by kant kodali <ka...@gmail.com> on 2018/02/15 02:11:59 UTC, 0 replies.
- Pyspark UDF/map fucntion throws pickling exception - posted by Selvam Raman <se...@gmail.com> on 2018/02/15 11:44:07 UTC, 1 replies.
- pyspark+spacy throwing pickling exception - posted by Selvam Raman <se...@gmail.com> on 2018/02/15 12:08:24 UTC, 2 replies.
- [spark-sql] Custom Query Execution listener via conf properties - posted by kurian vs <vs...@gmail.com> on 2018/02/16 08:43:53 UTC, 1 replies.
- Does the classloader used by spark blocks the I/O calls from UDF's? - posted by kant kodali <ka...@gmail.com> on 2018/02/16 12:08:23 UTC, 0 replies.
- "Too Large DataFrame" shuffle Fetch Failed exception in Spark SQL (SPARK-16753) (SPARK-9862)(SPARK-5928)(TAGs - Spark SQL, Intermediate Level, Debug) - posted by Ashutosh Ranjan <as...@gmail.com> on 2018/02/16 12:55:34 UTC, 0 replies.
- Can spark handle this scenario? - posted by Lian Jiang <ji...@gmail.com> on 2018/02/17 00:10:20 UTC, 16 replies.
- Java Heap Space Error - posted by Vinay Muttineni <Vi...@microsoft.com.INVALID> on 2018/02/17 01:25:18 UTC, 0 replies.
- can we do self join on streaming dataset in 2.2.0? - posted by kant kodali <ka...@gmail.com> on 2018/02/18 01:16:58 UTC, 0 replies.
- Does Pyspark Support Graphx? - posted by 94035420 <gu...@qq.com> on 2018/02/18 01:36:12 UTC, 10 replies.
- Can Precompiled Stand Alone Python Application Submitted To A Spark Cluster? - posted by xiaobo <gu...@qq.com> on 2018/02/18 05:26:08 UTC, 0 replies.
- [Pyspark Streaming + ml] How to combine - posted by Romain Jouin <ro...@gmail.com> on 2018/02/18 09:28:52 UTC, 0 replies.
- GC issues with spark job - posted by Nikhil Goyal <no...@gmail.com> on 2018/02/18 23:31:44 UTC, 0 replies.
- KafkaUtils.createStream(..) is removed for API - posted by naresh Goud <na...@gmail.com> on 2018/02/19 01:17:33 UTC, 3 replies.
- [SparkQL] how are RDDs partitioned and distributed in a standalone cluster? - posted by prabhastechie <pr...@gmail.com> on 2018/02/19 02:03:52 UTC, 0 replies.
- [graphframes]how Graphframes Deal With Bidirectional Relationships - posted by xiaobo <gu...@qq.com> on 2018/02/19 03:22:02 UTC, 2 replies.
- Understand task timing - posted by Thomas Decaux <eb...@gmail.com> on 2018/02/19 10:13:40 UTC, 0 replies.
- Re: [Spark Streaming]: Non-deterministic uneven task-to-machine assignment - posted by Aleksandar Vitorovic <sa...@gmail.com> on 2018/02/19 17:42:32 UTC, 3 replies.
- Errors when running unit tests - posted by karuppayya <ka...@gmail.com> on 2018/02/20 03:00:27 UTC, 0 replies.
- Re: [graphframes]how Graphframes Deal With BidirectionalRelationships - posted by xiaobo <gu...@qq.com> on 2018/02/20 04:35:13 UTC, 1 replies.
- sqoop import job not working when spark thrift server is running. - posted by akshay naidu <ak...@gmail.com> on 2018/02/20 05:43:08 UTC, 6 replies.
- The timestamp column for kafka records doesn't seem to change - posted by kant kodali <ka...@gmail.com> on 2018/02/20 13:41:58 UTC, 1 replies.
- Save the date: ApacheCon North America, September 24-27 in Montréal - posted by Rich Bowen <rb...@apache.org> on 2018/02/20 14:21:23 UTC, 0 replies.
- Write a DataFrame with Vector values into text/csv file - posted by Mina Aslani <as...@gmail.com> on 2018/02/20 20:12:48 UTC, 0 replies.
- Serialize a DataFrame with Vector values into text/csv file - posted by Mina Aslani <as...@gmail.com> on 2018/02/20 20:23:10 UTC, 8 replies.
- Job never finishing - posted by Nikhil Goyal <no...@gmail.com> on 2018/02/20 22:52:30 UTC, 2 replies.
- what is the right syntax for self joins in Spark 2.3.0 ? - posted by kant kodali <ka...@gmail.com> on 2018/02/21 03:52:37 UTC, 4 replies.
- CSV use case - posted by SNEHASISH DUTTA <in...@gmail.com> on 2018/02/21 08:53:58 UTC, 0 replies.
- FINAL REMINDER: CFP for Apache EU Roadshow Closes 25th February - posted by Sharan F <sh...@apache.org> on 2018/02/21 17:18:07 UTC, 0 replies.
- parquet vs orc files - posted by Kane Kim <ka...@gmail.com> on 2018/02/21 20:54:17 UTC, 5 replies.
- Return statements aren't allowed in Spark closures - posted by Lian Jiang <ji...@gmail.com> on 2018/02/21 21:16:08 UTC, 4 replies.
- I got weird error from a join - posted by "hsy541@gmail.com" <hs...@gmail.com> on 2018/02/22 01:02:30 UTC, 0 replies.
- Consuming Data in Parallel using Spark Streaming - posted by "Vibhakar, Beejal" <Be...@fisglobal.com> on 2018/02/22 03:12:45 UTC, 2 replies.
- Encoder with empty bytes deserializes with non-empty bytes - posted by David Capwell <dc...@gmail.com> on 2018/02/22 05:55:53 UTC, 1 replies.
- Hortonworks Spark-Hbase-Connector does not read zookeeper configurations from spark session config ??(Spark on Yarn) - posted by Dharmin Siddesh J <si...@gmail.com> on 2018/02/22 08:04:16 UTC, 0 replies.
- Spark not releasing shuffle files in time (with very large heap) - posted by Keith Chapman <ke...@gmail.com> on 2018/02/22 08:13:15 UTC, 5 replies.
- HBase connector does not read ZK configuration from Spark session - posted by Dharmin Siddesh J <si...@gmail.com> on 2018/02/23 04:55:58 UTC, 2 replies.
- What's relationship between the TensorflowOnSpark core modules? - posted by xiaobo <gu...@qq.com> on 2018/02/23 08:43:19 UTC, 0 replies.
- What happens if I can't fit data into memory while doing stream-stream join. - posted by kant kodali <ka...@gmail.com> on 2018/02/23 10:45:46 UTC, 0 replies.
- Spark-Solr -- unresolved dependencies - posted by Selvam Raman <se...@gmail.com> on 2018/02/23 11:50:24 UTC, 0 replies.
- NotSerializableException with Trait - posted by Jean Rossier <je...@sqooba.io> on 2018/02/23 13:59:15 UTC, 0 replies.
- Reservoir sampling in parallel - posted by Patrick McCarthy <pm...@dstillery.com> on 2018/02/23 15:44:59 UTC, 0 replies.
- Spark with Kudu behaving unexpectedly when bringing down the Kudu Service - posted by ravidspark <ra...@gmail.com> on 2018/02/23 17:09:03 UTC, 0 replies.
- Apache Spark - Structured Streaming reading from Kafka some tasks take much longer - posted by M Singh <ma...@yahoo.com.INVALID> on 2018/02/23 19:38:52 UTC, 2 replies.
- Spark on yarn, all executors are allocated in same host,how to adjust? - posted by "changhongzhao@foxmail.com" <ch...@foxmail.com> on 2018/02/24 08:16:03 UTC, 6 replies.
- Spark 2.3.1 Continuous processing does not support StreamingRelation operations.; - posted by kant kodali <ka...@gmail.com> on 2018/02/24 08:50:52 UTC, 0 replies.
- Timezone conversion using from_utc_timestamp - posted by Srinath C <sr...@gmail.com> on 2018/02/24 17:08:54 UTC, 0 replies.
- Saving spark output to multiple files as map - posted by pooja bhojwani <po...@gmail.com> on 2018/02/24 19:03:56 UTC, 0 replies.
- scala question (in spark project)- not able to call getClassSchema method in avro generated class - posted by karan alang <ka...@gmail.com> on 2018/02/24 19:34:09 UTC, 0 replies.
- Trigger.ProcessingTime("10 seconds") & Trigger.Continuous(10.seconds) - posted by naresh Goud <na...@gmail.com> on 2018/02/25 20:26:40 UTC, 2 replies.
- CATALYST rule join - posted by tan shai <ta...@gmail.com> on 2018/02/25 22:08:25 UTC, 2 replies.
- Is there a way to query dataframe views directly without going through scheduler? - posted by kant kodali <ka...@gmail.com> on 2018/02/26 09:32:15 UTC, 0 replies.
- Out of memory Error when using Collection Accumulator Spark 2.2 - posted by Patrick <ti...@gmail.com> on 2018/02/26 09:45:13 UTC, 1 replies.
- Spark EMR executor-core vs Vcores - posted by Selvam Raman <se...@gmail.com> on 2018/02/26 10:20:14 UTC, 9 replies.
- Data loss in spark job - posted by Faraz Mateen <fm...@an10.io> on 2018/02/26 11:46:00 UTC, 2 replies.
- spark 2 new stuff - posted by Mich Talebzadeh <mi...@gmail.com> on 2018/02/26 14:26:06 UTC, 1 replies.
- Spark on K8s - using files fetched by init-container? - posted by Jenna Hoole <je...@gmail.com> on 2018/02/26 18:51:53 UTC, 3 replies.
- partitionBy with partitioned column in output? - posted by Alex Nastetsky <al...@verve.com> on 2018/02/26 22:28:19 UTC, 2 replies.
- how to add columns to row when column has a different encoder? - posted by David Capwell <dc...@gmail.com> on 2018/02/26 23:50:16 UTC, 1 replies.
- SizeEstimator - posted by Xin Liu <xi...@gmail.com> on 2018/02/27 00:47:07 UTC, 7 replies.
- How does Spark Structured Streaming determine an event has arrived late? - posted by kant kodali <ka...@gmail.com> on 2018/02/27 10:26:48 UTC, 3 replies.
- Returns Null when reading data from XML Ask Question - posted by Sateesh Karuturi <sa...@gmail.com> on 2018/02/27 10:30:34 UTC, 0 replies.
- Suppressing output from Apache Ivy (?) when calling spark-submit with --packages - posted by Nicholas Chammas <ni...@gmail.com> on 2018/02/27 18:37:43 UTC, 0 replies.
- Spark MLlib: Should I call .cache before fitting a model? - posted by Gevorg Hari <ge...@gmail.com> on 2018/02/27 19:24:47 UTC, 1 replies.
- [Beginner] Kafka 0.11 header support in Spark Structured Streaming - posted by Karthik Jayaraman <as...@gmail.com> on 2018/02/27 22:51:39 UTC, 2 replies.
- Is stream-stream join by default a stateful operation? - posted by kant kodali <ka...@gmail.com> on 2018/02/28 06:55:31 UTC, 0 replies.
- Joins in spark for large tables - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2018/02/28 17:40:46 UTC, 0 replies.
- org.apache.spark.SparkException: Task failed while writing rows - posted by unk1102 <um...@gmail.com> on 2018/02/28 18:39:58 UTC, 7 replies.
- [Beginner] How to save Kafka Dstream data to parquet ? - posted by karthikus <as...@gmail.com> on 2018/02/28 19:09:01 UTC, 2 replies.