user@spark.apache.org, 2018-01

You are viewing a plain text version of this content. The canonical link for it is here.

- Re: Spark on EMR suddenly stalling - posted by Rohit Karlupia <ro...@qubole.com> on 2018/01/01 15:41:16 UTC, 4 replies.
- mesos cluster dispatcher - posted by puneetloya <pu...@gmail.com> on 2018/01/01 21:49:08 UTC, 0 replies.
- Re: Custom line/record delimiter - posted by Hyukjin Kwon <gu...@gmail.com> on 2018/01/02 01:54:32 UTC, 1 replies.
- Current way of using functions.window with Java - posted by Anton Puzanov <an...@gmail.com> on 2018/01/02 14:05:11 UTC, 0 replies.
- Re: Converting binary files - posted by "Lalwani, Jayesh" <Ja...@capitalone.com> on 2018/01/03 02:48:28 UTC, 0 replies.
- Unclosed NingWSCLient holds up a Spark appication - posted by "Lalwani, Jayesh" <Ja...@capitalone.com> on 2018/01/03 03:00:21 UTC, 0 replies.
- [Spark SQL] How to run a custom meta query for `ANALYZE TABLE` - posted by Jason Heo <ja...@gmail.com> on 2018/01/03 04:17:15 UTC, 1 replies.
- Is spark-env.sh sourced by Application Master and Executor for Spark on YARN? - posted by John Zhuge <jz...@apache.org> on 2018/01/03 06:57:35 UTC, 6 replies.
- Apache Spark - Question about Structured Streaming Sink addBatch dataframe size - posted by M Singh <ma...@yahoo.com.INVALID> on 2018/01/03 17:53:57 UTC, 4 replies.
- Structured Streaming + Kafka - Corrupted Checkpoint Offsets / Commits - posted by William Briggs <wr...@gmail.com> on 2018/01/04 17:08:16 UTC, 1 replies.
- Java heap space OutOfMemoryError in pyspark spark-submit (spark version:2.2) - posted by Anu B Nair <an...@gmail.com> on 2018/01/05 06:08:38 UTC, 0 replies.
- Spark MLlib Question - Online Scoring of PipelineModel - posted by Gevorg Hari <ge...@gmail.com> on 2018/01/05 18:23:31 UTC, 0 replies.
- Reinforcement Learning with Spark - posted by "Md. Rezaul Karim" <re...@insight-centre.org> on 2018/01/06 01:04:39 UTC, 0 replies.
- RE: How to convert Array of Json rows into Dataset of specific columns in Spark 2.2.0? - posted by Hien Luu <hi...@gmail.com> on 2018/01/06 19:42:57 UTC, 2 replies.
- Re: Spark job failing on jackson dependencies - posted by Fawze Abujaber <fa...@gmail.com> on 2018/01/06 20:50:40 UTC, 0 replies.
- Is Apache Spark-2.2.1 compatible with Hadoop-3.0.0 - posted by akshay naidu <ak...@gmail.com> on 2018/01/07 04:23:40 UTC, 5 replies.
- Limit the block size of data received by spring streaming receiver - posted by Xilang Yan <xi...@gmail.com> on 2018/01/08 02:36:49 UTC, 0 replies.
- Reverse MinMaxScaler in SparkML - posted by Tomasz Dudek <me...@gmail.com> on 2018/01/08 09:10:31 UTC, 1 replies.
- Spark Monitoring using Jolokia - posted by Irtiza Ali <ia...@an10.io> on 2018/01/08 13:55:40 UTC, 3 replies.
- binaryFiles() on directory full of directories - posted by Christopher Piggott <cp...@gmail.com> on 2018/01/08 15:03:28 UTC, 0 replies.
- Spark structured streaming time series forecasting - posted by Bogdan Cojocar <bo...@gmail.com> on 2018/01/08 15:04:04 UTC, 1 replies.
- PIG to Spark - posted by Pralabh Kumar <pr...@gmail.com> on 2018/01/08 15:25:50 UTC, 2 replies.
- Spark MakeRDD preferred workers - posted by Christopher Piggott <cp...@gmail.com> on 2018/01/08 20:51:55 UTC, 0 replies.
- select with more than 5 typed columns - posted by Nathan Kronenfeld <nk...@uncharted.software> on 2018/01/08 23:01:19 UTC, 0 replies.
- [SPARK-CORE] JVM Properties passed as -D, not being found inside UDAF classes - posted by "Uchoa, Rodrigo" <ro...@accenture.com> on 2018/01/09 13:59:08 UTC, 0 replies.
- Palantir replease under org.apache.spark? - posted by Nan Zhu <zh...@gmail.com> on 2018/01/09 17:42:50 UTC, 4 replies.
- Spark UI stdout/stderr links point to executors internal address - posted by Jhon Anderson Cardenas Diaz <jh...@gmail.com> on 2018/01/09 22:13:19 UTC, 0 replies.
- How to create security filter for Spark UI in Spark on YARN - posted by Jhon Anderson Cardenas Diaz <jh...@gmail.com> on 2018/01/09 22:23:50 UTC, 0 replies.
- Dataset API inconsistencies - posted by Alex Nastetsky <al...@verve.com> on 2018/01/10 00:45:39 UTC, 1 replies.
- py4j.protocol.Py4JJavaError: An error occurred while calling o794.parquet - posted by Liana Napalkova <li...@eurecat.org> on 2018/01/10 16:58:13 UTC, 3 replies.
- No Tasks have reported metrics yet - posted by Joel D <ga...@gmail.com> on 2018/01/10 18:00:16 UTC, 0 replies.
- Vectorized ORC Reader in Apache Spark 2.3 with Apache ORC 1.4.1. - posted by Dongjoon Hyun <do...@gmail.com> on 2018/01/10 19:14:37 UTC, 2 replies.
- Regression in Spark SQL UI Tab in Spark 2.2.1 - posted by Yuval Itzchakov <yu...@gmail.com> on 2018/01/11 13:50:10 UTC, 0 replies.
- Logback + Spark 2.2.0 - posted by Sara Galindo Martínez <sa...@eurecat.org> on 2018/01/11 15:02:16 UTC, 0 replies.
- Timestamp changing while writing - posted by sk skk <sp...@gmail.com> on 2018/01/12 00:48:47 UTC, 1 replies.
- Is there alternative HiveStoragePredicateHandler#decomposePredicate? - posted by 강민우 <mi...@navercorp.com> on 2018/01/12 05:43:35 UTC, 0 replies.
- Using Logistic regression with SGD in Spark ML - posted by NinjaYali <ya...@gmail.com> on 2018/01/12 21:23:39 UTC, 0 replies.
- flatMapGroupsWithState not timing out (spark 2.2.1) - posted by daniel williams <da...@gmail.com> on 2018/01/12 22:36:27 UTC, 2 replies.
- Spark preserve timestamp - posted by sk skk <sp...@gmail.com> on 2018/01/13 00:21:40 UTC, 0 replies.
- Re: Inner join with the table itself - posted by Jacek Laskowski <ja...@japila.pl> on 2018/01/15 10:09:49 UTC, 4 replies.
- End of Stream errors in shuffle - posted by Fernando Pereira <fe...@gmail.com> on 2018/01/15 10:32:02 UTC, 1 replies.
- [Spark DataFrame]: Passing DataFrame to custom method results in NullPointerException - posted by ab...@bt.com on 2018/01/15 11:56:01 UTC, 1 replies.
- 3rd party hadoop input formats for EDI formats - posted by Saravanan Nagarajan <ns...@gmail.com> on 2018/01/15 18:01:06 UTC, 1 replies.
- [Spark ML] Positive-Only Training Classification in Scala - posted by Matt Hicks <ma...@outr.com> on 2018/01/15 18:21:33 UTC, 7 replies.
- can HDFS be a streaming source like Kafka in Spark 2.2.0? - posted by kant kodali <ka...@gmail.com> on 2018/01/15 22:41:59 UTC, 6 replies.
- Re: Broken SQL Visualization? - posted by Ted Yu <yu...@gmail.com> on 2018/01/15 23:07:38 UTC, 1 replies.
- spark-submit can find python? - posted by Manuel Sopena Ballesteros <ma...@garvan.org.au> on 2018/01/15 23:53:23 UTC, 2 replies.
- Run jobs in parallel in standalone mode - posted by Onur EKİNCİ <oe...@innova.com.tr> on 2018/01/16 08:00:57 UTC, 10 replies.
- unsubscribe - posted by Muhammad Yaseen Aftab <ya...@gaditek.com> on 2018/01/16 12:20:42 UTC, 4 replies.
- Mail List Daily Archive - posted by dmp <da...@dandymadeproductions.com> on 2018/01/16 17:26:59 UTC, 0 replies.
- Null pointer exception in checkpoint directory - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2018/01/16 19:42:03 UTC, 0 replies.
- Spark Streaming not reading missed data - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2018/01/16 21:04:38 UTC, 3 replies.
- spark streaming kafka not displaying data in local eclipse - posted by vr spark <vr...@gmail.com> on 2018/01/17 05:05:59 UTC, 0 replies.
- "Got wrong record after seeking to offset" issue - posted by Justin Miller <ju...@protectwise.com> on 2018/01/17 05:10:12 UTC, 4 replies.
- Testing Spark-Cassandra - posted by Guillermo Ortiz <ko...@gmail.com> on 2018/01/17 15:48:08 UTC, 2 replies.
- update LD_LIBRARY_PATH when running apache job in a YARN cluster - posted by Manuel Sopena Ballesteros <ma...@garvan.org.au> on 2018/01/18 01:39:33 UTC, 1 replies.
- Spark Stream is corrupted - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2018/01/18 04:39:19 UTC, 1 replies.
- StreamingLogisticRegressionWithSGD : Multiclass Classification : Options - posted by Sundeep Kumar Mehta <su...@gmail.com> on 2018/01/18 05:17:02 UTC, 3 replies.
- good materiala to learn apache spark - posted by Manuel Sopena Ballesteros <ma...@garvan.org.au> on 2018/01/18 06:15:58 UTC, 1 replies.
- Spark application on yarn cluster clarification - posted by Soheil Pourbafrani <so...@gmail.com> on 2018/01/18 08:12:04 UTC, 1 replies.
- spark linear regression model fit result is different from statsmodels linear model. - posted by TonyHu <37...@qq.com> on 2018/01/18 08:12:05 UTC, 0 replies.
- Writing to Redshift from Kafka Streaming source - posted by Somasundaram Sekar <so...@tigeranalytics.com> on 2018/01/18 08:13:42 UTC, 0 replies.
- Does Spark and Hive use Same SQL parser : ANTLR - posted by Pralabh Kumar <pr...@gmail.com> on 2018/01/18 09:43:55 UTC, 0 replies.
- Writing data in HDFS high available cluster - posted by Soheil Pourbafrani <so...@gmail.com> on 2018/01/18 10:49:10 UTC, 1 replies.
- Structured Streaming with Kafka seems to be losing config options - posted by chris snow <ch...@gmail.com> on 2018/01/18 11:13:31 UTC, 1 replies.
- Reading Hive RCFiles? - posted by Michael Segel <ms...@hotmail.com> on 2018/01/18 15:32:54 UTC, 1 replies.
- [Structured Streaming]: Structured Streaming into Redshift sink - posted by Somasundaram Sekar <so...@tigeranalytics.com> on 2018/01/19 04:34:43 UTC, 0 replies.
- Unsubscribe - posted by Anu B Nair <an...@gmail.com> on 2018/01/19 06:11:35 UTC, 3 replies.
- [Spark structured streaming] Use of (flat)mapgroupswithstate takes long time - posted by chris-sw <ch...@semmelwise.nl> on 2018/01/19 07:28:10 UTC, 2 replies.
- [ML] Allow CrossValidation ParamGrid on SVMWithSGD - posted by Tomasz Dudek <me...@gmail.com> on 2018/01/19 11:59:08 UTC, 1 replies.
- Spark MLLib vs. SciKitLearn - posted by Aakash Basu <aa...@gmail.com> on 2018/01/19 13:42:49 UTC, 1 replies.
- is there a way to write a Streaming Dataframe/Dataset to Cassandra with auto mapping? - posted by kant kodali <ka...@gmail.com> on 2018/01/19 17:24:27 UTC, 0 replies.
- external shuffle service in mesos - posted by "igor.berman" <ig...@gmail.com> on 2018/01/20 16:33:55 UTC, 4 replies.
- Saving each line of RDD as a separate file with key as the file name - posted by pooja bhojwani <po...@gmail.com> on 2018/01/20 21:56:30 UTC, 1 replies.
- Re: Reading Hive RCFiles? - posted by Prakash Joshi <pr...@gmail.com> on 2018/01/20 22:28:24 UTC, 2 replies.
- Gracefully shutdown spark streaming application - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2018/01/21 20:22:41 UTC, 0 replies.
- Processing huge amount of data from paged API - posted by anonymous <cl...@gmail.com> on 2018/01/21 20:33:30 UTC, 2 replies.
- Has there been any explanation on the performance degradation between spark.ml and Mllib? - posted by Stephen Boesch <ja...@gmail.com> on 2018/01/21 21:49:41 UTC, 2 replies.
- Is there any Spark ML or MLLib API for GINI for Model Evaluation? Please help! [EOM] - posted by Aakash Basu <aa...@gmail.com> on 2018/01/22 07:04:53 UTC, 0 replies.
- run spark job in yarn cluster mode as specified user - posted by sd wang <pi...@gmail.com> on 2018/01/22 07:28:20 UTC, 5 replies.
- [Help] Converting a Python Numpy code into Spark using RDD - posted by Aakash Basu <aa...@gmail.com> on 2018/01/22 07:37:50 UTC, 0 replies.
- Using window function works extremely slowly - posted by Anton Puzanov <an...@gmail.com> on 2018/01/22 08:59:12 UTC, 0 replies.
- Spark and CEP type examples - posted by Esa Heikkinen <es...@student.tut.fi> on 2018/01/22 11:38:10 UTC, 0 replies.
- Spark querying C* in Scala - posted by Conconscious <co...@gmail.com> on 2018/01/22 13:43:10 UTC, 2 replies.
- How do I extract a value in foreachRDD operation - posted by Toy <no...@gmail.com> on 2018/01/22 16:19:05 UTC, 0 replies.
- Re: spark 2.0 and spark 2.2 - posted by Xiao Li <ga...@gmail.com> on 2018/01/22 17:18:56 UTC, 0 replies.
- Re: [EXT] How do I extract a value in foreachRDD operation - posted by Michael Mansour <Mi...@symantec.com> on 2018/01/22 17:25:41 UTC, 1 replies.
- Production Critical : Data loss in spark streaming - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2018/01/22 17:36:03 UTC, 0 replies.
- Spark vs Snowflake - posted by Mich Talebzadeh <mi...@gmail.com> on 2018/01/22 21:51:30 UTC, 1 replies.
- Spark Streaming data loss checkpoint directory - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2018/01/23 02:27:28 UTC, 0 replies.
- How to hold some data in memory while processing rows in a DataFrame? - posted by David Rosenstrauch <da...@gmail.com> on 2018/01/23 03:24:12 UTC, 4 replies.
- Spark Tuning Tool - posted by Rohit Karlupia <ro...@qubole.com> on 2018/01/23 05:01:39 UTC, 9 replies.
- Spark SQL bucket pruning support - posted by Joe Wang <me...@joewang.net> on 2018/01/23 06:11:15 UTC, 0 replies.
- [Structured streaming] Merging streaming with semi-static datasets - posted by Christiaan Ras <ch...@semmelwise.nl> on 2018/01/23 11:32:30 UTC, 0 replies.
- I can't save DataFrame from running Spark locally - posted by Toy <no...@gmail.com> on 2018/01/23 19:33:11 UTC, 2 replies.
- S3 token times out during data frame "write.csv" - posted by Vasyl Harasymiv <va...@gmail.com> on 2018/01/23 22:58:45 UTC, 5 replies.
- write parquet with statistics min max with binary field - posted by Stephen Joung <st...@vcnc.co.kr> on 2018/01/24 01:30:32 UTC, 2 replies.
- uncontinuous offset in kafka will cause the spark streaming failure - posted by namesuperwood <na...@gmail.com> on 2018/01/24 05:48:47 UTC, 1 replies.
- Re: uncontinuous offset in kafka will cause the spark streamingfailure - posted by namesuperwood <na...@gmail.com> on 2018/01/24 06:45:18 UTC, 2 replies.
- Question about accumulator - posted by "hsy541@gmail.com" <hs...@gmail.com> on 2018/01/24 06:46:45 UTC, 0 replies.
- Questions about using pyspark 2.1.1 pushing data to kafka - posted by "hsy541@gmail.com" <hs...@gmail.com> on 2018/01/24 07:02:16 UTC, 0 replies.
- spark.sql call takes far too long - posted by Michael Shtelma <ms...@gmail.com> on 2018/01/24 12:16:36 UTC, 1 replies.
- Providing Kafka configuration as Map of Strings - posted by Tecno Brain <ce...@gmail.com> on 2018/01/24 20:32:44 UTC, 2 replies.
- Apache Hadoop and Spark - posted by Mutahir Ali <so...@outlook.com> on 2018/01/24 20:50:11 UTC, 1 replies.
- CI/CD for spark and scala - posted by Deepak Sharma <de...@gmail.com> on 2018/01/25 03:52:15 UTC, 0 replies.
- Re: a way to allow spark job to continue despite task failures? - posted by Sunita Arvind <su...@gmail.com> on 2018/01/25 04:34:25 UTC, 0 replies.
- Scala version changed in spark job - posted by Fawze Abujaber <fa...@gmail.com> on 2018/01/25 06:40:01 UTC, 0 replies.
- Kafka deserialization to Structured Streaming SQL - Encoders.bean result doesn't match itself? - posted by Iain Cundy <Ia...@amdocs.com> on 2018/01/25 12:11:41 UTC, 0 replies.
- Custom build - missing images on MasterWebUI - posted by Conconscious <co...@gmail.com> on 2018/01/25 18:09:11 UTC, 0 replies.
- Get broadcast (set in one method) in another method - posted by Margusja <ma...@roo.ee> on 2018/01/25 20:04:15 UTC, 1 replies.
- how to create a DataType Object using the String representation in Java using Spark 2.2.0? - posted by kant kodali <ka...@gmail.com> on 2018/01/26 00:22:21 UTC, 3 replies.
- Spark Standalone Mode, application runs, but executor is killed - posted by Chandu <ch...@gmail.com> on 2018/01/26 03:09:24 UTC, 4 replies.
- Apache Spark - Custom structured streaming data source - posted by M Singh <ma...@yahoo.com.INVALID> on 2018/01/26 04:36:12 UTC, 2 replies.
- Best active groups, forums or contacts for Spark ? - posted by Esa Heikkinen <es...@student.tut.fi> on 2018/01/26 11:15:09 UTC, 4 replies.
- Apache Spark - Spark Structured Streaming - Watermark usage - posted by M Singh <ma...@yahoo.com.INVALID> on 2018/01/26 18:14:16 UTC, 2 replies.
- Optimize sort merge join - posted by Antoine Bonnin <an...@c-ways.com> on 2018/01/27 14:17:08 UTC, 0 replies.
- Spark Streaming Cluster queries - posted by puneetloya <pu...@gmail.com> on 2018/01/27 17:07:25 UTC, 1 replies.
- Semi-supervised learning in MLlib - posted by Franco Victorio <vi...@gmail.com> on 2018/01/27 17:29:03 UTC, 0 replies.
- Custom Catalyst Optimizer Strategy for DataFrame Writes? - posted by CCInCharge <ch...@gmail.com> on 2018/01/27 23:17:21 UTC, 0 replies.
- Spark Dataframe Writer _temporary directory - posted by Richard Primera <ri...@woombatcg.com> on 2018/01/29 05:24:20 UTC, 0 replies.
- How and when the types of the result set are figured out in Spark? - posted by kant kodali <ka...@gmail.com> on 2018/01/29 06:52:53 UTC, 0 replies.
- mapGroupsWithState in Python - posted by ayan guha <gu...@gmail.com> on 2018/01/29 07:25:44 UTC, 2 replies.
- Spark Streaming checkpoint - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2018/01/29 18:47:14 UTC, 0 replies.
- Type Casting Error in Spark Data Frame - posted by Arnav kumar <ak...@gmail.com> on 2018/01/29 21:26:27 UTC, 4 replies.
- Schema - DataTypes.NullType - posted by Jean Georges Perrin <jg...@jgp.net> on 2018/01/29 22:05:18 UTC, 1 replies.
- spark.sql.adaptive.enabled has no effect - posted by 张万新 <ke...@gmail.com> on 2018/01/30 12:26:31 UTC, 0 replies.
- [Doubt] GridSearch for Hyperparameter Tuning in Spark - posted by Aakash Basu <aa...@gmail.com> on 2018/01/30 12:31:10 UTC, 0 replies.
- Data Integration with Chinese Social Media Sites - posted by Sanjay Kulkarni <sa...@manthan.com> on 2018/01/30 12:46:24 UTC, 0 replies.
- use kafka streams API aggregate ? - posted by "446463844@qq.com" <44...@qq.com> on 2018/01/30 14:48:53 UTC, 1 replies.
- ML:One vs Rest with crossValidator for multinomial in logistic regression - posted by michelleyang <mi...@gmail.com> on 2018/01/30 14:55:01 UTC, 1 replies.
- 回复: Re: use kafka streams API aggregate ? - posted by "446463844@qq.com" <44...@qq.com> on 2018/01/30 15:08:10 UTC, 0 replies.
- [Spark Streaming]: Non-deterministic uneven task-to-machine assignment - posted by LongVehicle <sa...@gmail.com> on 2018/01/30 15:12:30 UTC, 1 replies.
- spark job error - posted by shyla deshpande <de...@gmail.com> on 2018/01/30 16:52:37 UTC, 1 replies.
- Issue with Cast in Spark Sql - posted by Arnav kumar <ak...@gmail.com> on 2018/01/31 02:48:22 UTC, 1 replies.
- why groupByKey still shuffle if SQL does "Distribute By" on same columns ? - posted by Dibyendu Bhattacharya <di...@gmail.com> on 2018/01/31 03:51:01 UTC, 0 replies.
- Spark Structured Streaming for Twitter Streaming data - posted by Divya Gehlot <di...@gmail.com> on 2018/01/31 07:26:05 UTC, 1 replies.
- Prefer Structured Streaming over Spark Streaming (DStreams)? - posted by Biplob Biswas <re...@gmail.com> on 2018/01/31 10:35:50 UTC, 2 replies.
- Singular Value Decomposition (SVD) in Spark Java - posted by Donni Khan <pr...@googlemail.com> on 2018/01/31 13:55:36 UTC, 0 replies.
- Data of ArrayType field getting truncated when saving to parquet - posted by HARSH TAKKAR <ta...@gmail.com> on 2018/01/31 14:20:17 UTC, 0 replies.
- Re: Max number of streams supported ? - posted by Michael Armbrust <mi...@databricks.com> on 2018/01/31 20:39:49 UTC, 1 replies.
- Apache Spark - Exception on adding column to Structured Streaming DataFrame - posted by M Singh <ma...@yahoo.com.INVALID> on 2018/01/31 23:35:57 UTC, 1 replies.