You are viewing a plain text version of this content. The canonical link for it is here.
- Re: How to use ManualClock with Spark streaming - posted by Saisai Shao <sa...@gmail.com> on 2017/03/01 01:39:58 UTC, 1 replies.
- Re: Why Spark cannot get the derived field of case class in Dataset? - posted by Michael Armbrust <mi...@databricks.com> on 2017/03/01 02:46:24 UTC, 0 replies.
- Can't transform RDD for the second time - posted by jeremycod <zo...@gmail.com> on 2017/03/01 04:17:08 UTC, 1 replies.
- Re: Spark - Not contains on Spark dataframe - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2017/03/01 04:29:38 UTC, 1 replies.
- Re: Custom log4j.properties on AWS EMR - posted by Prithish <pr...@gmail.com> on 2017/03/01 05:33:03 UTC, 0 replies.
- Re: RDD blocks on Spark Driver - posted by Prithish <pr...@gmail.com> on 2017/03/01 05:35:16 UTC, 0 replies.
- Jar not in shell classpath in Windows 10 - posted by Justin Pihony <ju...@gmail.com> on 2017/03/01 06:05:19 UTC, 2 replies.
- Re: using spark to load a data warehouse in real time - posted by Jörn Franke <jo...@gmail.com> on 2017/03/01 07:25:27 UTC, 4 replies.
- 答复: 答复: spark append files to the same hdfs dir issue for LeaseExpiredException - posted by "Triones,Deng(vip.com)" <tr...@vipshop.com> on 2017/03/01 09:25:46 UTC, 0 replies.
- Number of partitions in Dataset aggregations - posted by Jakub Dubovsky <sp...@gmail.com> on 2017/03/01 09:28:35 UTC, 0 replies.
- Spark driver CPU usage - posted by "Phadnis, Varun" <ph...@sky.optymyze.com> on 2017/03/01 12:11:18 UTC, 3 replies.
- [Spark] Accumulators or count() - posted by "Charles O. Bajomo" <ch...@pretechconsulting.co.uk> on 2017/03/01 12:26:42 UTC, 1 replies.
- Spark Streaming - java.lang.ClassNotFoundException Scala anonymous function - posted by Dominik Safaric <do...@gmail.com> on 2017/03/01 13:19:50 UTC, 1 replies.
- Re: Spark Streaming - java.lang.ClassNotFoundException Scala anonymous function - posted by Sean Owen <so...@cloudera.com> on 2017/03/01 13:51:27 UTC, 0 replies.
- Continuous or Categorical - posted by Madabhattula Rajesh Kumar <mr...@gmail.com> on 2017/03/01 14:36:08 UTC, 1 replies.
- question on transforms for spark 2.0 dataset - posted by Bill Schwanitz <bi...@bilsch.org> on 2017/03/01 16:21:54 UTC, 3 replies.
- Re: Why spark history server does not show RDD even if it is persisted? - posted by Parag Chaudhari <pa...@gmail.com> on 2017/03/01 17:33:14 UTC, 0 replies.
- Combining reading from Kafka and HDFS w/ Spark Streaming - posted by Mike Thomsen <mi...@gmail.com> on 2017/03/01 19:50:53 UTC, 1 replies.
- reaasign location of partitions - posted by Simona Rabinovici-Cohen <SI...@il.ibm.com> on 2017/03/01 21:17:34 UTC, 0 replies.
- Spark 2.0 issue with left_outer join - posted by Ankur Srivastava <an...@gmail.com> on 2017/03/01 21:28:28 UTC, 4 replies.
- Best way to assign a unique IDs to row groups - posted by Everett Anderson <ev...@nuna.com.INVALID> on 2017/03/01 21:50:10 UTC, 0 replies.
- strange usage of tempfile.mkdtemp() in PySpark mllib.recommendation doctest - posted by Han-Cheol Cho <ha...@nhn-techorus.com> on 2017/03/02 08:30:38 UTC, 0 replies.
- Restart if driver gets insufficient resources - posted by vimal dinakaran <vi...@gmail.com> on 2017/03/02 08:32:39 UTC, 0 replies.
- Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources - posted by Aseem Bansal <as...@gmail.com> on 2017/03/02 11:04:51 UTC, 10 replies.
- spark keeps on creating executors and each one fails with "TransportClient has not yet been set." - posted by Aseem Bansal <as...@gmail.com> on 2017/03/02 12:04:15 UTC, 1 replies.
- SimpleConfigObject - posted by Madabhattula Rajesh Kumar <mr...@gmail.com> on 2017/03/02 19:45:58 UTC, 0 replies.
- How to tune groupBy operations in Spark 2.x? - posted by SRK <sw...@gmail.com> on 2017/03/02 21:14:40 UTC, 0 replies.
- Re: spark sql: full outer join optimization - posted by 任弘迪 <ry...@gmail.com> on 2017/03/03 07:53:40 UTC, 0 replies.
- kafka and zookeeper set up in prod for spark streaming - posted by Mich Talebzadeh <mi...@gmail.com> on 2017/03/03 08:15:51 UTC, 4 replies.
- Server Log Processing - Regex or ElasticSearch? - posted by Gaurav1809 <ga...@gmail.com> on 2017/03/03 08:27:57 UTC, 1 replies.
- Resource manager: estimation of application execution time/remaining time. - posted by Mazen <ma...@gmail.com> on 2017/03/03 12:17:17 UTC, 0 replies.
- Problems when submitting a spark job via the REST API - posted by Kristinn Rúnarsson <kr...@activitystream.com> on 2017/03/03 13:22:15 UTC, 1 replies.
- How to run a spark on Pycharm - posted by Anahita Talebi <an...@gmail.com> on 2017/03/03 14:43:40 UTC, 5 replies.
- Re: Spark join over sorted columns of dataset. - posted by Rohit Verma <ro...@rokittech.com> on 2017/03/03 16:06:05 UTC, 2 replies.
- [RDDs and Dataframes] Equivalent expressions for RDD API - posted by Old-School <gi...@outlook.com> on 2017/03/04 13:59:31 UTC, 3 replies.
- Not able to remove header from a text file while creating a data frame . - posted by PS...@in.imshealth.com on 2017/03/04 14:42:57 UTC, 1 replies.
- Sharing my DataFrame (DataSet) cheat sheet. - posted by Yuhao Yang <hh...@gmail.com> on 2017/03/04 20:55:46 UTC, 1 replies.
- unsubscribe - posted by Howard Chen <ho...@microsoft.com.INVALID> on 2017/03/05 13:40:54 UTC, 0 replies.
- spark jobserver - posted by Madabhattula Rajesh Kumar <mr...@gmail.com> on 2017/03/05 15:39:25 UTC, 1 replies.
- Re: pyspark cluster mode on standalone deployment - posted by Ofer Eliassaf <of...@gmail.com> on 2017/03/05 18:43:09 UTC, 0 replies.
- Spark Beginner: Correct approach for use case - posted by Allan Richards <al...@gmail.com> on 2017/03/05 20:49:28 UTC, 4 replies.
- Re: [ANNOUNCE] Apache Bahir 2.1.0 Released - posted by kant kodali <ka...@gmail.com> on 2017/03/05 21:15:17 UTC, 0 replies.
- [Spark Streamiing] Streaming job failing consistently after 1h - posted by "Charles O. Bajomo" <ch...@pretechconsulting.co.uk> on 2017/03/06 02:37:29 UTC, 0 replies.
- Kafka failover with multiple data centers - posted by nguyen duc Tuan <ne...@gmail.com> on 2017/03/06 02:51:45 UTC, 2 replies.
- How do I deal with ever growing application log - posted by Timothy Chan <tc...@lumoslabs.com> on 2017/03/06 04:18:00 UTC, 2 replies.
- FPGrowth Model is taking too long to generate frequent item sets - posted by Raju Bairishetti <ra...@apache.org> on 2017/03/06 04:56:07 UTC, 5 replies.
- LinearRegressionModel - Negative Predicted Value - posted by Manish Maheshwari <my...@gmail.com> on 2017/03/06 08:05:26 UTC, 2 replies.
- Wrong runtime type when using newAPIHadoopFile in Java - posted by Nira <am...@gmail.com> on 2017/03/06 11:29:33 UTC, 4 replies.
- Spark application does not work with only one core - posted by Maximilien Belinga <ma...@wouri.co> on 2017/03/06 17:15:22 UTC, 1 replies.
- org.apache.spark.SparkException: Task not serializable - posted by Mina Aslani <as...@gmail.com> on 2017/03/06 21:06:21 UTC, 5 replies.
- Trouble with Thriftserver with hsqldb (Spark 2.1.0) - posted by Yana Kadiyska <ya...@gmail.com> on 2017/03/06 22:02:20 UTC, 0 replies.
- How does Spark provide Hive style bucketing support? - posted by SRK <sw...@gmail.com> on 2017/03/07 02:30:22 UTC, 0 replies.
- Spark Jobs under Cluster Manager - posted by Ankur Srivastava <an...@gmail.com> on 2017/03/07 02:40:14 UTC, 0 replies.
- Check if dataframe is empty - posted by AS...@nz.imshealth.com on 2017/03/07 03:52:06 UTC, 5 replies.
- Spark JDBC reads - posted by El-Hassan Wanas <el...@gmail.com> on 2017/03/07 11:04:34 UTC, 4 replies.
- How to unit test spark streaming? - posted by kant kodali <ka...@gmail.com> on 2017/03/07 12:04:51 UTC, 4 replies.
- (python) Spark .textFile(s3://…) access denied 403 with valid credentials - posted by Jonhy Stack <so...@gmail.com> on 2017/03/07 15:21:37 UTC, 1 replies.
- Huge partitioning job takes longer to close after all tasks finished - posted by Swapnil Shinde <sw...@gmail.com> on 2017/03/07 18:45:15 UTC, 4 replies.
- Does anybody use spark.rpc.io.mode=epoll? - posted by Steven Ruppert <st...@fullcontact.com> on 2017/03/07 19:46:23 UTC, 0 replies.
- Issues: Generate JSON with null values in Spark 2.0.x - posted by Chetan Khatri <ch...@gmail.com> on 2017/03/07 20:58:38 UTC, 3 replies.
- Structured Streaming - Kafka - posted by "Bowden, Chris" <ch...@hpe.com> on 2017/03/07 21:52:49 UTC, 2 replies.
- finding Spark Master - posted by Adaryl Wakefield <ad...@hotmail.com> on 2017/03/07 23:27:27 UTC, 5 replies.
- Spark job stopping abrubptly - posted by Divya Gehlot <di...@gmail.com> on 2017/03/08 01:47:15 UTC, 0 replies.
- PySpark Serialization/Deserialization (Pickling) Overhead - posted by Yeoul Na <ye...@uci.edu> on 2017/03/08 02:18:12 UTC, 2 replies.
- made spark job to throw exception still going under finished succeeded status in yarn - posted by nancy henry <na...@gmail.com> on 2017/03/08 02:25:52 UTC, 0 replies.
- Failed to connect to master ... - posted by Mina Aslani <as...@gmail.com> on 2017/03/08 04:33:44 UTC, 4 replies.
- Spark is inventing its own AWS secret key - posted by Jonhy Stack <so...@gmail.com> on 2017/03/08 11:14:59 UTC, 0 replies.
- spark executor memory, jvm config - posted by "TheGeorge1918 ." <zh...@gmail.com> on 2017/03/08 13:45:32 UTC, 1 replies.
- question on Write Ahead Log (Spark Streaming ) - posted by kant kodali <ka...@gmail.com> on 2017/03/08 16:58:44 UTC, 2 replies.
- Apparent memory leak involving count - posted by Facundo Domínguez <fa...@gmail.com> on 2017/03/08 17:02:28 UTC, 4 replies.
- Re: Why does Spark Streaming application with Kafka fail with “requirement failed: numRecords must not be negative”? - posted by Muhammad Haseeb Javed <11...@seecs.edu.pk> on 2017/03/08 17:55:50 UTC, 0 replies.
- Getting the methods registered with a SparkSession - posted by yael aharon <ya...@gmail.com> on 2017/03/08 22:42:20 UTC, 0 replies.
- - posted by sathyanarayanan mudhaliyar <sa...@gmail.com> on 2017/03/09 03:00:06 UTC, 3 replies.
- spark-sql use case beginner question - posted by nancy henry <na...@gmail.com> on 2017/03/09 03:06:26 UTC, 4 replies.
- Spark failing while persisting sorted columns. - posted by Rohit Verma <ro...@rokittech.com> on 2017/03/09 09:41:09 UTC, 1 replies.
- Spark Jobs filling up the disk at SPARK_LOCAL_DIRS location - posted by kant kodali <ka...@gmail.com> on 2017/03/09 16:25:44 UTC, 0 replies.
- Question on Spark's graph libraries - posted by enzo <en...@smartinsightsfromdata.com> on 2017/03/09 17:42:54 UTC, 2 replies.
- How does preprocessing fit into Spark MLlib pipeline - posted by aATv <ad...@vidora.com> on 2017/03/09 19:02:05 UTC, 2 replies.
- Which streaming platform is best? Kafka or Spark Streaming? - posted by Gaurav1809 <ga...@gmail.com> on 2017/03/09 19:37:14 UTC, 5 replies.
- Distinct for Avro Key/Value PairRDD - posted by Alex Sulimanov <as...@Tremorvideo.com> on 2017/03/09 20:24:41 UTC, 0 replies.
- PickleException when collecting DataFrame containing empty bytearray() - posted by tot0 <lu...@gmail.com> on 2017/03/09 22:30:24 UTC, 1 replies.
- ml package data types - posted by jinhong lu <lu...@gmail.com> on 2017/03/10 03:24:21 UTC, 0 replies.
- Re: Case class with POJO - encoder issues - posted by geoHeil <ge...@gmail.com> on 2017/03/10 08:09:59 UTC, 0 replies.
- How can an RDD make its every elements to a new RDD ? - posted by Mars Xu <xu...@gmail.com> on 2017/03/10 09:57:55 UTC, 0 replies.
- Re: How to gracefully handle Kafka OffsetOutOfRangeException - posted by Ramkumar Venkataraman <ra...@gmail.com> on 2017/03/10 10:21:47 UTC, 4 replies.
- [Spark Streaming][Spark SQL] Design suggestions needed for sessionization - posted by Ramkumar Venkataraman <ra...@gmail.com> on 2017/03/10 10:44:04 UTC, 0 replies.
- Re: can spark take advantage of ordered data? - posted by sourabh chaki <ch...@gmail.com> on 2017/03/10 14:03:00 UTC, 3 replies.
- spark streaming with kafka source, how many concurrent jobs? - posted by shyla deshpande <de...@gmail.com> on 2017/03/10 18:02:52 UTC, 4 replies.
- How to improve performance of saveAsTextFile() - posted by "Parsian, Mahmoud" <mp...@illumina.com> on 2017/03/11 06:33:23 UTC, 1 replies.
- Re: java.io.NotSerializableException: org.apache.spark.streaming.StreamingContext - posted by 萝卜丝炒饭 <14...@qq.com> on 2017/03/11 08:10:05 UTC, 1 replies.
- Re: Pyspark 2.1.0 weird behavior with repartition - posted by Olivier Girardot <o....@lateral-thoughts.com> on 2017/03/11 22:35:03 UTC, 0 replies.
- Spark thriff server hiveStatement.getQueryLog return empty - posted by 李斌松 <li...@gmail.com> on 2017/03/12 14:37:28 UTC, 0 replies.
- Differences between scikit-learn and Spark.ml for regression toy problem - posted by Frank Astier <fa...@linkedin.com.INVALID> on 2017/03/13 02:20:12 UTC, 3 replies.
- spark-streaming stopping - posted by sathyanarayanan mudhaliyar <sa...@gmail.com> on 2017/03/13 04:31:15 UTC, 0 replies.
- The speed of Spark streaming reading data from kafka stays low - posted by churly lin <ch...@gmail.com> on 2017/03/13 07:20:40 UTC, 1 replies.
- how to construct parameter for model.transform() from datafile - posted by jinhong lu <lu...@gmail.com> on 2017/03/13 08:31:17 UTC, 2 replies.
- Spark and continuous integration - posted by Sam Elamin <hu...@gmail.com> on 2017/03/13 09:55:56 UTC, 5 replies.
- keep or remove sc.stop() coz of RpcEnv already stopped error - posted by nancy henry <na...@gmail.com> on 2017/03/13 11:08:32 UTC, 2 replies.
- Adding metrics to spark datasource - posted by AssafMendelson <as...@rsa.com> on 2017/03/13 12:59:36 UTC, 0 replies.
- Sorted partition ranges without overlap - posted by Kristoffer Sjögren <st...@gmail.com> on 2017/03/13 13:34:10 UTC, 1 replies.
- Monitoring ongoing Spark Job when run in Yarn Cluster mode - posted by Sourav Mazumder <so...@gmail.com> on 2017/03/13 13:53:35 UTC, 2 replies.
- Graphframes PageRank ends up on 1 partition - posted by Olivier Girardot <o....@lateral-thoughts.com> on 2017/03/13 18:15:41 UTC, 0 replies.
- Structured Streaming - Can I start using it? - posted by Gaurav1809 <ga...@gmail.com> on 2017/03/13 18:21:39 UTC, 4 replies.
- Java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem - posted by Mina Aslani <as...@gmail.com> on 2017/03/13 21:45:49 UTC, 0 replies.
- Java Examples @ Spark github - posted by Mina Aslani <as...@gmail.com> on 2017/03/13 21:55:06 UTC, 0 replies.
- Online learning of LDA model in Spark (update an existing model) - posted by matd <ma...@gmail.com> on 2017/03/13 22:47:34 UTC, 0 replies.
- DataFrameWriter - Where to find list of Options applicable to particular format(datasource) - posted by Nirav Patel <np...@xactlycorp.com> on 2017/03/14 00:20:03 UTC, 2 replies.
- Re: [SparkSQL] too many open files although ulimit set to 1048576 - posted by darin <li...@foxmail.com> on 2017/03/14 02:44:53 UTC, 0 replies.
- [MLlib] kmeans random initialization, same seed every time - posted by Julian Keppel <ju...@gmail.com> on 2017/03/14 12:44:46 UTC, 2 replies.
- [MLlib] Multiple estimators for cross validation - posted by David Leifker <dl...@gmail.com> on 2017/03/14 13:07:10 UTC, 0 replies.
- OffsetOutOfRangeException - posted by Mohammad Kargar <mk...@phemi.com> on 2017/03/14 19:58:38 UTC, 0 replies.
- Setting Optimal Number of Spark Executor Instances - posted by kpeng1 <kp...@gmail.com> on 2017/03/14 22:30:49 UTC, 4 replies.
- Scaling Kafka Direct Streming application - posted by Pranav Shukla <pr...@brevitaz.com> on 2017/03/15 00:47:47 UTC, 1 replies.
- Re: how to construct parameter for model.transform() from datafile - posted by Yuhao Yang <hh...@gmail.com> on 2017/03/15 01:05:08 UTC, 0 replies.
- Fast write datastore... - posted by muthu <ba...@gmail.com> on 2017/03/15 06:04:12 UTC, 15 replies.
- apply UDFs to N columns dynamically in dataframe - posted by anup ahire <ah...@gmail.com> on 2017/03/15 06:04:57 UTC, 2 replies.
- Spark SQL Skip and Log bad records - posted by Aviral Agarwal <av...@gmail.com> on 2017/03/15 07:01:09 UTC, 0 replies.
- Setting spark.yarn.stagingDir in 1.6 - posted by Saurav Sinha <sa...@gmail.com> on 2017/03/15 10:36:47 UTC, 1 replies.
- Thrift Server as JDBC endpoint - posted by Sebastian Piu <se...@gmail.com> on 2017/03/15 12:37:01 UTC, 0 replies.
- [Spark CSV]: Use Custom TextInputFormat to Prevent Exceptions - posted by Nathan Case <nc...@gravyanalytics.com> on 2017/03/15 15:56:23 UTC, 2 replies.
- how to call recommend method from ml.recommendation.ALS - posted by lk_spark <lk...@163.com> on 2017/03/16 01:32:15 UTC, 4 replies.
- [Spark Streaming+Kafka][How-to] - posted by "OUASSAIDI, Sami" <sa...@mind7.fr> on 2017/03/16 12:16:21 UTC, 7 replies.
- Dataset : Issue with Save - posted by Bahubali Jain <ba...@gmail.com> on 2017/03/16 17:39:43 UTC, 5 replies.
- CSV empty columns handling in Spark 2.0.2 - posted by George Obama <fj...@gmail.com> on 2017/03/16 18:28:16 UTC, 1 replies.
- Hive on Spark Job Monitoring - posted by Ninad Shringarpure <ni...@cloudera.com> on 2017/03/16 19:13:28 UTC, 0 replies.
- [Spark Streaming] Checkpoint backup (.bk) file purpose - posted by Bartosz Konieczny <ba...@gmail.com> on 2017/03/16 19:16:33 UTC, 0 replies.
- Streaming 2.1.0 - window vs. batch duration - posted by Dominik Safaric <do...@gmail.com> on 2017/03/16 19:34:49 UTC, 2 replies.
- Spark 2.0.2 Dataset union() slowness vs RDD union? - posted by Everett Anderson <ev...@nuna.com.INVALID> on 2017/03/16 21:55:03 UTC, 4 replies.
- Re: Streaming 2.1.0 - window vs. batch duration - posted by Michael Armbrust <mi...@databricks.com> on 2017/03/16 22:45:52 UTC, 1 replies.
- spark streaming exectors memory increasing and executor killed by yarn - posted by darin <li...@foxmail.com> on 2017/03/17 01:59:53 UTC, 4 replies.
- RDD can not convert to df, thanks - posted by 萝卜丝炒饭 <14...@qq.com> on 2017/03/17 03:48:20 UTC, 3 replies.
- Spark 2.0.2 - hiveContext.emptyDataFrame.except(hiveContext.emptyDataFrame).count() - posted by Ravindra <ra...@gmail.com> on 2017/03/17 08:30:58 UTC, 2 replies.
- org.apache.spark.ui.jobs.UIData$TaskMetricsUIData - posted by Jiří Syrový <sy...@gmail.com> on 2017/03/17 10:17:03 UTC, 1 replies.
- UI Metrics data memory consumption - posted by xjrk <sy...@gmail.com> on 2017/03/17 10:36:03 UTC, 0 replies.
- Getting 2.0.2 for the link http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz - posted by George Obama <fj...@gmail.com> on 2017/03/17 16:55:19 UTC, 0 replies.
- HyperLogLogMonoid for unique visitor count in Spark Streaming - posted by SRK <sw...@gmail.com> on 2017/03/17 21:23:29 UTC, 0 replies.
- How to redistribute dataset without full shuffle - posted by Artur R <ar...@gpnxgroup.com> on 2017/03/17 21:52:01 UTC, 0 replies.
- Spark master IP on Kubernetes - posted by ffarozan <ff...@gmail.com> on 2017/03/18 04:30:05 UTC, 0 replies.
- Spark Streaming from Kafka, deal with initial heavy load. - posted by "sagarcasual ." <sa...@gmail.com> on 2017/03/18 04:53:02 UTC, 2 replies.
- SparkStreaming getActiveOrCreate - posted by Justin Pihony <ju...@gmail.com> on 2017/03/18 06:08:59 UTC, 0 replies.
- [Spark SQL & Core]: RDD to Dataset 1500 columns data with createDataFrame() throw exception of grows beyond 64 KB - posted by elevy <el...@gmail.com> on 2017/03/18 08:13:27 UTC, 2 replies.
- If TypedColumn is a subclass of Column, why I cannot apply function on it in Dataset? - posted by Yong Zhang <ja...@hotmail.com> on 2017/03/19 03:54:06 UTC, 0 replies.
- how to retain part of the features in LogisticRegressionModel (spark2.0) - posted by jinhong lu <lu...@gmail.com> on 2017/03/19 10:12:07 UTC, 1 replies.
- Re: how to retain part of the features in LogisticRegressionModel (spark2.0) - posted by Dhanesh Padmanabhan <dh...@gmail.com> on 2017/03/19 11:08:22 UTC, 4 replies.
- Contributing to Spark - posted by Sam Elamin <hu...@gmail.com> on 2017/03/19 22:38:44 UTC, 2 replies.
- Foreachpartition in spark streaming - posted by Diwakar Dhanuskodi <di...@gmail.com> on 2017/03/20 06:20:54 UTC, 1 replies.
- Recombining output files in parallel - posted by Matt Deaver <ma...@gmail.com> on 2017/03/20 16:47:19 UTC, 0 replies.
- worker connected to standalone cluster are continuously crashing - posted by Diego Fanesi <di...@gmail.com> on 2017/03/21 02:55:10 UTC, 0 replies.
- Merging Schema while reading Parquet files - posted by Aditya Borde <bo...@gmail.com> on 2017/03/21 14:53:12 UTC, 1 replies.
- Easily creating custom encoders - posted by Ashic Mahtab <as...@live.com> on 2017/03/21 17:13:46 UTC, 1 replies.
- data cleaning and error routing - posted by vincent gromakowski <vi...@gmail.com> on 2017/03/21 17:15:32 UTC, 0 replies.
- Spark data frame map problem - posted by Shashank Mandil <ma...@gmail.com> on 2017/03/21 18:40:06 UTC, 1 replies.
- [SparkSQL] Project using NamedExpression - posted by Aviral Agarwal <av...@gmail.com> on 2017/03/21 20:15:14 UTC, 0 replies.
- Local spark context on an executor - posted by Shashank Mandil <ma...@gmail.com> on 2017/03/21 22:34:48 UTC, 5 replies.
- Spark Streaming questions, just 2 - posted by shyla deshpande <de...@gmail.com> on 2017/03/21 22:57:41 UTC, 0 replies.
- kafka and spark integration - posted by Adaryl Wakefield <ad...@hotmail.com> on 2017/03/22 07:08:23 UTC, 1 replies.
- [ Spark Streaming & Kafka 0.10 ] Possible bug - posted by "Afshartous, Nick" <na...@wbgames.com> on 2017/03/22 17:18:50 UTC, 0 replies.
- calculate diff of value and median in a group - posted by Craig Ching <cr...@gmail.com> on 2017/03/22 19:17:33 UTC, 5 replies.
- Custom Spark data source in Java - posted by Jean Georges Perrin <jg...@jgp.net> on 2017/03/22 19:27:59 UTC, 0 replies.
- Spark streaming to kafka exactly once - posted by Maurin Lenglart <ma...@cuberonlabs.com> on 2017/03/22 19:49:09 UTC, 3 replies.
- Re: Custom Spark data source in Java - posted by Jörn Franke <jo...@gmail.com> on 2017/03/22 20:00:22 UTC, 2 replies.
- Best way to deal with skewed partition sizes - posted by Matt Deaver <ma...@gmail.com> on 2017/03/22 20:30:11 UTC, 3 replies.
- Having issues reading a csv file into a DataSet using Spark 2.1 - posted by Keith Chapman <ke...@gmail.com> on 2017/03/22 23:18:21 UTC, 3 replies.
- Converting dataframe to dataset question - posted by shyla deshpande <de...@gmail.com> on 2017/03/23 01:07:36 UTC, 6 replies.
- Mismatch in data type comparision results full data in Spark - posted by santlal56 <sa...@bitwiseglobal.com> on 2017/03/23 06:19:59 UTC, 0 replies.
- Collaborative Filtering - scaling of the regularization parameter - posted by chris snow <ch...@gmail.com> on 2017/03/23 07:03:22 UTC, 3 replies.
- Persist RDD doubt - posted by nayan sharma <na...@gmail.com> on 2017/03/23 09:16:26 UTC, 1 replies.
- Re: Persist RDD doubt - posted by Jörn Franke <jo...@gmail.com> on 2017/03/23 09:55:48 UTC, 2 replies.
- [Worker Crashing] OutOfMemoryError: GC overhead limit execeeded - posted by Behroz Sikander <be...@gmail.com> on 2017/03/23 10:46:28 UTC, 6 replies.
- Re: GraphX Pregel API: add vertices and edges - posted by Robineast <Ro...@xense.co.uk> on 2017/03/23 11:11:59 UTC, 2 replies.
- Does spark's random forest need categorical features to be one hot encoded? - posted by Aseem Bansal <as...@gmail.com> on 2017/03/23 14:34:10 UTC, 1 replies.
- Application kill from UI do not propagate exception - posted by Noorul Islam Kamal Malmiyoda <no...@noorul.com> on 2017/03/23 15:21:12 UTC, 2 replies.
- [PySpark] - Binary File Partition - posted by jjayadeep <ja...@gmail.com> on 2017/03/23 15:36:25 UTC, 0 replies.
- [ANNOUNCE] Apache Gora 0.7 Release - posted by lewis john mcgibbney <le...@apache.org> on 2017/03/23 19:49:01 UTC, 0 replies.
- how to read object field within json file - posted by Selvam Raman <se...@gmail.com> on 2017/03/23 21:03:16 UTC, 4 replies.
- Spark dataframe, UserDefinedAggregateFunction(UDAF) help!! - posted by shyla deshpande <de...@gmail.com> on 2017/03/24 00:18:11 UTC, 3 replies.
- Re: LDA in Spark - posted by Joseph Bradley <jo...@databricks.com> on 2017/03/24 01:14:01 UTC, 0 replies.
- Re: Aggregated column name - posted by Kevin Mellott <ke...@gmail.com> on 2017/03/24 01:48:49 UTC, 1 replies.
- How to load "kafka" as a data source - posted by Gaurav1809 <ga...@gmail.com> on 2017/03/24 03:47:11 UTC, 1 replies.
- skipping header in multiple files - posted by nayan sharma <na...@gmail.com> on 2017/03/24 06:39:26 UTC, 0 replies.
- Spark 2.0.2 : Hang at "org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:623)" - posted by Ravindra <ra...@gmail.com> on 2017/03/24 10:40:45 UTC, 1 replies.
- spark-submit config via file - posted by ", Roy" <rp...@njit.edu> on 2017/03/24 11:38:11 UTC, 4 replies.
- unable to stream kafka messages - posted by kaniska <ka...@gmail.com> on 2017/03/24 17:49:38 UTC, 7 replies.
- Multiple cores/executors in Pyspark standalone mode - posted by Li Jin <ic...@gmail.com> on 2017/03/24 19:43:59 UTC, 3 replies.
- KMean clustering resulting Skewed Issue - posted by Reth RM <re...@gmail.com> on 2017/03/25 01:37:47 UTC, 2 replies.
- Intermittent issue while running Spark job through SparkLauncher - posted by Mohammad Tariq <do...@gmail.com> on 2017/03/26 00:00:10 UTC, 0 replies.
- What CI tool does Databricks use? - posted by kant kodali <ka...@gmail.com> on 2017/03/26 12:08:43 UTC, 0 replies.
- Collaborative filtering steps in spark - posted by chris snow <ch...@gmail.com> on 2017/03/26 22:48:58 UTC, 2 replies.
- spark 2.1.0 foreachRDD write slowly to HDFS - posted by "446463844@qq.com" <44...@qq.com> on 2017/03/27 03:17:22 UTC, 0 replies.
- Why selectExpr changes schema (to include id column)? - posted by Jacek Laskowski <ja...@japila.pl> on 2017/03/27 08:58:47 UTC, 3 replies.
- apache-spark: Converting List of Rows into Dataset Java - posted by Karin Valisova <ka...@datapine.com> on 2017/03/27 09:27:18 UTC, 3 replies.
- How to insert nano seconds in the TimestampType in Spark - posted by Devender Yadav <de...@impetus.co.in> on 2017/03/27 12:29:00 UTC, 1 replies.
- This is a test mail, please ignore! - posted by Noorul Islam K M <no...@noorul.com> on 2017/03/27 16:33:11 UTC, 0 replies.
- Upgrade the scala code using the most updated Spark version - posted by Anahita Talebi <an...@gmail.com> on 2017/03/27 19:25:50 UTC, 15 replies.
- Spark shuffle files - posted by Ashwin Sai Shankar <as...@netflix.com.INVALID> on 2017/03/27 19:38:52 UTC, 3 replies.
- Support Stored By Clause - posted by Denny Lee <de...@gmail.com> on 2017/03/27 23:45:30 UTC, 0 replies.
- spark streaming write orc suddenly slow? - posted by "446463844@qq.com" <44...@qq.com> on 2017/03/28 02:05:01 UTC, 0 replies.
- Utilities for Twitter Analysis? - posted by Gaurav1809 <ga...@gmail.com> on 2017/03/28 07:50:57 UTC, 0 replies.
- Writing dataframe to a final path using another temporary path - posted by yohann jardin <yo...@hotmail.com> on 2017/03/28 08:59:02 UTC, 0 replies.
- problem reading binary source with apache streaming when using JavaStreaminContext.binaryRecordsStream() - posted by Hamza HACHANI <ha...@supcom.tn> on 2017/03/28 09:25:39 UTC, 0 replies.
- question on DStreams - posted by kant kodali <ka...@gmail.com> on 2017/03/28 10:28:51 UTC, 0 replies.
- Groupby in fast in Impala than spark sql - any suggestions - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2017/03/28 14:35:41 UTC, 2 replies.
- dataframe join questions? - posted by shyla deshpande <de...@gmail.com> on 2017/03/28 21:57:53 UTC, 0 replies.
- GraphFrames 0.4.0 release, with Apache Spark 2.1 support - posted by Joseph Bradley <jo...@databricks.com> on 2017/03/28 23:42:33 UTC, 0 replies.
- Fwd: Spark UI circular redirect - posted by Vincent Ly <vs...@gmail.com> on 2017/03/28 23:42:55 UTC, 0 replies.
- Re: Shuffling on Dataframe to RDD conversion with a map transformation - posted by Patrick <ti...@gmail.com> on 2017/03/29 02:24:59 UTC, 0 replies.
- Re: dataframe join questions. Appreciate your input. - posted by shyla deshpande <de...@gmail.com> on 2017/03/29 06:47:58 UTC, 0 replies.
- Need help for RDD/DF transformation. - posted by Mungeol Heo <mu...@gmail.com> on 2017/03/29 09:37:19 UTC, 7 replies.
- Secondary Sort using Apache Spark 1.6 - posted by Pariksheet Barapatre <pb...@gmail.com> on 2017/03/29 13:02:23 UTC, 2 replies.
- Issues with partitionBy method on data frame writer SPARK 2.0.2 - posted by Luke Swift <lu...@googlemail.com> on 2017/03/29 13:34:58 UTC, 0 replies.
- httpclient conflict in spark - posted by Arvind Kandaswamy <ar...@gmail.com> on 2017/03/29 13:42:09 UTC, 2 replies.
- Re: Spark SQL, dataframe join questions. - posted by shyla deshpande <de...@gmail.com> on 2017/03/29 16:33:36 UTC, 3 replies.
- Spark streaming + kafka error with json library - posted by Srikanth <sr...@gmail.com> on 2017/03/29 16:59:23 UTC, 2 replies.
- Returning DataFrame for text file - posted by George Obama <fj...@gmail.com> on 2017/03/29 18:58:21 UTC, 0 replies.
- Alternatives for dataframe collectAsList() - posted by "szep.laszlo.it" <sz...@gmail.com> on 2017/03/29 19:00:06 UTC, 0 replies.
- Why VectorUDT private? - posted by Ryan <ry...@gmail.com> on 2017/03/30 02:57:04 UTC, 5 replies.
- How best we can store streaming data on dashboards for real time user experience? - posted by Gaurav1809 <ga...@gmail.com> on 2017/03/30 05:01:14 UTC, 7 replies.
- Will the setting for spark.default.parallelism be used for spark.sql.shuffle.output.partitions? - posted by shyla deshpande <de...@gmail.com> on 2017/03/30 16:58:05 UTC, 1 replies.
- spark kafka consumer with kerberos - posted by Bill Schwanitz <bi...@bilsch.org> on 2017/03/30 17:58:22 UTC, 2 replies.
- spark 2 and kafka consumer with ssl/kerberos - posted by Bill Schwanitz <bi...@bilsch.org> on 2017/03/30 19:24:25 UTC, 1 replies.
- Predicate not getting pusdhown to PrunedFilterScan - posted by Hanumath Rao Maduri <ha...@gmail.com> on 2017/03/30 23:30:37 UTC, 2 replies.
- Looking at EMR Logs - posted by Paul Tremblay <pa...@gmail.com> on 2017/03/31 00:45:59 UTC, 2 replies.
- dataframe filter, unable to bind variable - posted by shyla deshpande <de...@gmail.com> on 2017/03/31 02:15:32 UTC, 2 replies.
- Spark SQL 2.1 Complex SQL - Query Planning Issue - posted by Sathish Kumaran Vairavelu <vs...@gmail.com> on 2017/03/31 02:25:32 UTC, 3 replies.
- Parquet Filter PushDown - posted by Rahul Nandi <ra...@gmail.com> on 2017/03/31 05:31:37 UTC, 0 replies.
- How to PushDown ParquetFilter Spark 2.0.1 dataframe - posted by Rahul Nandi <ra...@gmail.com> on 2017/03/31 05:45:22 UTC, 1 replies.
- Research paper used in GraphX - posted by "Md. Rezaul Karim" <re...@insight-centre.org> on 2017/03/31 10:05:31 UTC, 0 replies.
- Partitioning in spark while reading from RDBMS via JDBC - posted by Devender Yadav <de...@impetus.co.in> on 2017/03/31 22:51:54 UTC, 0 replies.