user@spark.apache.org, 2016-06

You are viewing a plain text version of this content. The canonical link for it is here.

- Re: Map tuple to case class in Dataset - posted by Saisai Shao <sa...@gmail.com> on 2016/06/01 02:21:37 UTC, 6 replies.
- Re: how to get file name of record being reading in spark - posted by Vikash Kumar <vi...@gmail.com> on 2016/06/01 03:56:20 UTC, 0 replies.
- Re: Accessing s3a files from Spark - posted by Gourav Sengupta <go...@gmail.com> on 2016/06/01 07:13:09 UTC, 0 replies.
- Spark Twitter Stream throws Null Pointer Exception - posted by mayankshete <ma...@yash.com> on 2016/06/01 07:37:08 UTC, 1 replies.
- Re: About a problem when mapping a file located within a HDFS vmware cdh-5.7 image - posted by Alonso Isidoro Roman <al...@gmail.com> on 2016/06/01 07:53:19 UTC, 0 replies.
- Windows Rstudio to Linux spakR - posted by Selvam Raman <se...@gmail.com> on 2016/06/01 08:55:18 UTC, 1 replies.
- Shuffle Service - Connection Inactive - Creating new one - posted by krishmah <kr...@gmail.com> on 2016/06/01 11:13:36 UTC, 0 replies.
- hivecontext and date format - posted by pseudo oduesp <ps...@gmail.com> on 2016/06/01 11:16:07 UTC, 1 replies.
- Ignore features in Random Forest - posted by Neha Mehta <ne...@gmail.com> on 2016/06/01 13:18:41 UTC, 2 replies.
- Re: Spark Job Execution halts during shuffle... - posted by Priya Ch <le...@gmail.com> on 2016/06/01 13:30:30 UTC, 0 replies.
- Switching broadcast mechanism from torrrent - posted by Daniel Haviv <da...@veracity-group.com> on 2016/06/01 14:48:21 UTC, 7 replies.
- Best Practices for Spark Join - posted by Aakash Basu <ra...@gmail.com> on 2016/06/01 15:00:59 UTC, 0 replies.
- ImportError: No module named numpy - posted by Bhupendra Mishra <bh...@gmail.com> on 2016/06/01 15:31:07 UTC, 12 replies.
- Symbolic links in Spark - posted by Marco1982 <ma...@yahoo.it> on 2016/06/01 15:36:07 UTC, 0 replies.
- Saprk 1.6 Driver Memory Issue - posted by "Kali.tummala@gmail.com" <Ka...@gmail.com> on 2016/06/01 15:41:35 UTC, 3 replies.
- Using data frames to join separate RDDs in spark streaming - posted by Cyril Scetbon <cy...@free.fr> on 2016/06/01 16:00:08 UTC, 2 replies.
- Dataset Outer Join vs RDD Outer Join - posted by Richard Marscher <rm...@localytics.com> on 2016/06/01 16:58:08 UTC, 6 replies.
- How to enable core dump in spark - posted by prateek arora <pr...@gmail.com> on 2016/06/01 17:55:32 UTC, 4 replies.
- Re: Spark streaming readind avro from kafka - posted by justneeraj <ju...@gmail.com> on 2016/06/01 18:02:25 UTC, 2 replies.
- get and append file name in record being reading - posted by Vikash Kumar <vi...@gmail.com> on 2016/06/01 19:13:20 UTC, 1 replies.
- Re: Spark input size when filtering on parquet files - posted by Dennis Hunziker <de...@gmail.com> on 2016/06/01 20:51:05 UTC, 1 replies.
- --driver-cores for Standalone and YARN only?! What about Mesos? - posted by Jacek Laskowski <ja...@japila.pl> on 2016/06/01 21:18:40 UTC, 2 replies.
- Re: StackOverflow in Spark - posted by Matthew Young <ta...@gmail.com> on 2016/06/02 02:45:18 UTC, 5 replies.
- How should I interpret Spark's toDebugString()? - posted by dimoes <di...@gmail.com> on 2016/06/02 04:27:39 UTC, 0 replies.
- Spark Streaming join - posted by karthik tunga <ka...@gmail.com> on 2016/06/02 06:48:58 UTC, 0 replies.
- Re: spark-submit hive connection through spark Initial job has not accepted any resources - posted by vinayak <vi...@tcs.com> on 2016/06/02 07:07:13 UTC, 0 replies.
- Fwd: Beeline - Spark thrift server user retrieval Issue - posted by pooja mehta <sp...@gmail.com> on 2016/06/02 07:57:11 UTC, 0 replies.
- Container preempted by scheduler - Spark job error - posted by "Prabeesh K." <pr...@gmail.com> on 2016/06/02 08:32:20 UTC, 4 replies.
- Spark support for update/delete operations on Hive ORC transactional tables - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/06/02 08:54:14 UTC, 5 replies.
- Stream reading from database using spark streaming - posted by Zakaria Hili <za...@gmail.com> on 2016/06/02 13:26:35 UTC, 3 replies.
- How to generate seeded random numbers in GraphX Pregel API vertex procedure? - posted by Roman Pastukhov <me...@gmail.com> on 2016/06/02 14:20:55 UTC, 1 replies.
- Partitioning Data to optimize combineByKey - posted by Nathan Case <nc...@timerazor.com> on 2016/06/02 14:21:12 UTC, 0 replies.
- how to increase threads per executor - posted by Andres M Jimenez T <ad...@hotmail.com> on 2016/06/02 16:29:15 UTC, 5 replies.
- Seeking advice on realtime querying over JDBC - posted by Sunita Arvind <su...@gmail.com> on 2016/06/02 17:47:26 UTC, 2 replies.
- MLLIB, Random Forest and user defined loss function? - posted by xweb <as...@gmail.com> on 2016/06/02 20:33:06 UTC, 1 replies.
- Classpath hell and Elasticsearch 2.3.2... - posted by Kevin Burton <bu...@spinn3r.com> on 2016/06/02 22:34:11 UTC, 7 replies.
- How to share cached tables when the Thrift server runs in multi-session mode in spark 1.6 - posted by 谭成灶 <ta...@live.cn> on 2016/06/03 07:41:35 UTC, 0 replies.
- twitter data analysis - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/06/03 08:26:49 UTC, 2 replies.
- About a problem running a spark job in a cdh-5.7.0 vmware image. - posted by Alonso <al...@gmail.com> on 2016/06/03 10:39:17 UTC, 8 replies.
- Re: java.io.FileNotFoundException - posted by kishore kumar <ak...@gmail.com> on 2016/06/03 12:35:58 UTC, 3 replies.
- np.unique and collect - posted by pseudo oduesp <ps...@gmail.com> on 2016/06/03 13:07:27 UTC, 1 replies.
- [REPOST] Severe Spark Streaming performance degradation after upgrading to 1.6.1 - posted by Adrian Tanase <at...@adobe.com> on 2016/06/03 13:13:33 UTC, 7 replies.
- Save to a Partitioned Table using a Derived Column - posted by Benjamin Kim <bb...@gmail.com> on 2016/06/03 13:13:58 UTC, 7 replies.
- Spark Streaming w/variables used as dynamic queries - posted by Cyril Scetbon <cy...@free.fr> on 2016/06/03 14:35:38 UTC, 0 replies.
- JavaDStream to Dataframe: Java - posted by Zakaria Hili <za...@gmail.com> on 2016/06/03 14:44:21 UTC, 1 replies.
- Strategies for propery load-balanced partitioning - posted by Sa...@wellsfargo.com on 2016/06/03 15:31:02 UTC, 4 replies.
- Custom positioning/partitioning Dataframes - posted by Nilesh Chakraborty <ni...@nileshc.com> on 2016/06/03 16:09:42 UTC, 1 replies.
- Scheduler Delay Time - posted by alvarobrandon <al...@gmail.com> on 2016/06/03 16:36:52 UTC, 2 replies.
- TrackStateByKey operation for Python - posted by cmbendre <ch...@gmail.com> on 2016/06/03 19:00:22 UTC, 0 replies.
- Spark Streaming - long garbage collection time - posted by Marco1982 <ma...@yahoo.it> on 2016/06/03 19:19:24 UTC, 1 replies.
- Spark SQL Nested Array of JSON with empty field - posted by Jerry Wong <je...@gmail.com> on 2016/06/03 19:31:31 UTC, 2 replies.
- Twitter streaming error : No lease on /user/hduser/checkpoint/temp (inode 806125): File does not exist. - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/06/03 19:48:12 UTC, 6 replies.
- Does Spark uses data locality information from HDFS when running in standalone mode? - posted by Marco Capuccini <ma...@farmbio.uu.se> on 2016/06/05 09:50:23 UTC, 5 replies.
- Basic question on using one's own classes in the Scala app - posted by Ashok Kumar <as...@yahoo.com.INVALID> on 2016/06/05 10:06:23 UTC, 8 replies.
- Caching table partition after join - posted by "Zalzberg, Idan (Agoda)" <Id...@agoda.com> on 2016/06/05 12:55:23 UTC, 0 replies.
- Re: ML regression - spark context dies without error - posted by Yanbo Liang <yb...@gmail.com> on 2016/06/05 18:25:28 UTC, 0 replies.
- Akka with Hadoop/Spark - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2016/06/05 19:39:14 UTC, 1 replies.
- StackOverflowError even with JavaSparkContext union(JavaRDD... rdds) - posted by Everett Anderson <ev...@nuna.com.INVALID> on 2016/06/05 20:17:38 UTC, 2 replies.
- Performance of Spark/MapReduce - posted by Deepak Goel <de...@gmail.com> on 2016/06/05 21:37:54 UTC, 2 replies.
- Specify node where driver should run - posted by Saiph Kappa <sa...@gmail.com> on 2016/06/05 23:54:39 UTC, 11 replies.
- RE: GraphX Java API - posted by Santoshakhilesh <sa...@huawei.com> on 2016/06/06 01:39:26 UTC, 1 replies.
- Unsubscribe - posted by goutham koneru <go...@gmail.com> on 2016/06/06 02:30:14 UTC, 10 replies.
- Timeline for supporting basic operations like groupBy, joins etc on Streaming DataFrames - posted by raaggarw <ra...@adobe.com> on 2016/06/06 05:30:35 UTC, 5 replies.
- Unable to set ContextClassLoader in spark shell - posted by shengzhixia <sh...@gmail.com> on 2016/06/06 11:22:46 UTC, 1 replies.
- Re: How to modify collection inside a spark rdd foreach - posted by Robineast <Ro...@xense.co.uk> on 2016/06/06 11:23:43 UTC, 0 replies.
- Spark SQL - Encoders - case class - posted by Dave Maughan <da...@gmail.com> on 2016/06/06 12:13:33 UTC, 2 replies.
- Logging from transformations in PySpark - posted by Michael Ravits <mi...@gmail.com> on 2016/06/06 13:54:35 UTC, 0 replies.
- subscribe - posted by Kishorkumar Patil <kp...@yahoo-inc.com.INVALID> on 2016/06/06 15:42:05 UTC, 0 replies.
- groupByKey returns an emptyRDD - posted by Daniel Haviv <da...@veracity-group.com> on 2016/06/06 17:43:09 UTC, 1 replies.
- Apache Spark security.NosuchAlgorithm exception on changing from java 7 to java 8 - posted by verylucky Man <ve...@gmail.com> on 2016/06/06 19:31:19 UTC, 9 replies.
- identifying newly arrived files in s3 in spark streaming - posted by pandees waran <pa...@gmail.com> on 2016/06/07 03:21:09 UTC, 0 replies.
- Apache Spark Kafka Integration - org.apache.spark.SparkException: Couldn't find leader offsets for Set() - posted by Dominik Safaric <do...@gmail.com> on 2016/06/07 09:06:09 UTC, 13 replies.
- Advice on Scaling RandomForest - posted by Franc Carter <fr...@gmail.com> on 2016/06/07 11:09:33 UTC, 2 replies.
- Analyzing twitter data - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/06/07 11:16:40 UTC, 11 replies.
- Re: Spark 2.0 Release Date - posted by Arun Patel <ar...@gmail.com> on 2016/06/07 11:25:25 UTC, 9 replies.
- Spark_Usecase - posted by Ajay Chander <it...@gmail.com> on 2016/06/07 14:09:21 UTC, 8 replies.
- RESOLVED - Re: Apache Spark Kafka Integration - org.apache.spark.SparkException: Couldn't find leader offsets for Set() - posted by Dominik Safaric <do...@gmail.com> on 2016/06/07 15:55:20 UTC, 1 replies.
- Integrating spark source in an eclipse project? - posted by Cesar Flores <ce...@gmail.com> on 2016/06/07 16:04:37 UTC, 1 replies.
- Spark SQL: org.apache.spark.sql.AnalysisException: cannot resolve "some columns" given input columns. - posted by Jerry Wong <je...@gmail.com> on 2016/06/07 17:18:49 UTC, 1 replies.
- SparkR interaction with R libraries (currently 1.5.2) - posted by rachmaninovquartet <ra...@gmail.com> on 2016/06/07 17:58:12 UTC, 1 replies.
- Environment tab meaning - posted by satish saley <sa...@gmail.com> on 2016/06/07 18:11:38 UTC, 4 replies.
- Spark dynamic allocation - efficiently request new resource - posted by Nirav Patel <np...@xactlycorp.com> on 2016/06/07 18:13:27 UTC, 0 replies.
- Dataset - reduceByKey - posted by Bryan Jeffrey <br...@gmail.com> on 2016/06/07 18:32:58 UTC, 5 replies.
- MapType in Java unsupported in Spark 1.5 - posted by Baichuan YANG <yb...@gmail.com> on 2016/06/07 18:54:04 UTC, 0 replies.
- setting column names on dataset - posted by Koert Kuipers <ko...@tresata.com> on 2016/06/07 19:30:42 UTC, 2 replies.
- Affinity Propagation - posted by Tim Gautier <ti...@gmail.com> on 2016/06/07 19:55:16 UTC, 0 replies.
- Re: Join two Spark Streaming - posted by vinay453 <vi...@gmail.com> on 2016/06/07 20:15:35 UTC, 0 replies.
- Optional columns in Aggregated Metrics by Executor in web UI? - posted by Jacek Laskowski <ja...@japila.pl> on 2016/06/07 21:35:50 UTC, 0 replies.
- Sequential computation over several partitions - posted by Jeroen Miller <bl...@gmail.com> on 2016/06/07 21:54:51 UTC, 2 replies.
- Dealing with failures - posted by Mohit Anchlia <mo...@gmail.com> on 2016/06/08 00:38:43 UTC, 2 replies.
- Apache design patterns - posted by Francois Le Roux <le...@gmail.com> on 2016/06/08 03:33:35 UTC, 3 replies.
- Apache design pattern approaches - posted by sparkie <le...@gmail.com> on 2016/06/08 04:02:52 UTC, 0 replies.
- Spark streaming micro batch failure handling - posted by aviemzur <av...@gmail.com> on 2016/06/08 06:49:43 UTC, 0 replies.
- oozie and spark on yarn - posted by pseudo oduesp <ps...@gmail.com> on 2016/06/08 07:40:18 UTC, 2 replies.
- Trainning a spark ml linear regresion model fail after migrating from 1.5.2 to 1.6.1 - posted by philippe v <gl...@gmail.com> on 2016/06/08 08:22:05 UTC, 3 replies.
- unsubscribe - posted by Amal Babu <am...@gmail.com> on 2016/06/08 09:25:00 UTC, 3 replies.
- OneVsRest SVM - Very Low F-Measure compared to OneVsRest Logistic Regression - posted by Hayri Volkan Agun <vo...@gmail.com> on 2016/06/08 10:23:48 UTC, 0 replies.
- comparaing row in pyspark data frame - posted by pseudo oduesp <ps...@gmail.com> on 2016/06/08 12:05:15 UTC, 2 replies.
- Re: SQL JSON array operations - posted by amalik <as...@kreditech.com> on 2016/06/08 14:15:32 UTC, 0 replies.
- When queried through hiveContext, does hive executes these queries using its execution engine (default is map-reduce), or spark just reads the data and performs those queries itself? - posted by Himanshu Mehra <hi...@infoobjects.com> on 2016/06/08 14:34:24 UTC, 4 replies.
- Seq.toDF vs sc.parallelize.toDF = no Spark job vs one - why? - posted by Jacek Laskowski <ja...@japila.pl> on 2016/06/08 14:49:37 UTC, 2 replies.
- HiveContext: Unable to load AWS credentials from any provider in the chain - posted by Daniel Haviv <da...@veracity-group.com> on 2016/06/08 15:34:19 UTC, 5 replies.
- ChiSqSelector Selected Features Indicies - posted by Sebastian Kuepers <se...@publicispixelpark.de> on 2016/06/08 16:05:25 UTC, 0 replies.
- Creating a Hive table through Spark and potential locking issue (a bug) - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/06/08 18:32:53 UTC, 7 replies.
- UnsupportedOperationException: converting from RDD to DataSets on 1.6.1 - posted by Peter Halliday <pj...@cornell.edu> on 2016/06/08 19:45:29 UTC, 2 replies.
- UDTRegistration - posted by pgrandjean <pa...@yahoo.fr> on 2016/06/08 20:16:24 UTC, 0 replies.
- Variable in UpdateStateByKey Not Updating After Restarting Application from Checkpoint - posted by Joe Panciera <jo...@gmail.com> on 2016/06/08 20:27:35 UTC, 1 replies.
- Write Ahead Log - posted by Mohit Anchlia <mo...@gmail.com> on 2016/06/08 22:14:24 UTC, 2 replies.
- [ Standalone Spark Cluster ] - Track node status - posted by Rutuja Kulkarni <ru...@gmail.com> on 2016/06/08 22:56:37 UTC, 7 replies.
- Spark 2.0 Streaming and Event Time - posted by Chang Lim <ch...@gmail.com> on 2016/06/08 23:12:19 UTC, 3 replies.
- Spark Partition by Columns doesn't work properly - posted by Chanh Le <gi...@gmail.com> on 2016/06/09 04:08:23 UTC, 4 replies.
- Spark Streaming stateful operation to HBase - posted by soumick dasgupta <so...@gmail.com> on 2016/06/09 05:58:02 UTC, 1 replies.
- data frame or RDD for machine learning - posted by pseudo oduesp <ps...@gmail.com> on 2016/06/09 07:12:31 UTC, 2 replies.
- OutOfMemory when doing joins in spark 2.0 while same code runs fine in spark 1.5.2 - posted by raaggarw <ra...@adobe.com> on 2016/06/09 09:53:34 UTC, 3 replies.
- Error while using checkpointing . Spark streaming 1.5.2- DStream checkpointing has been enabled but the DStreams with their functions are not serialisable - posted by sandesh deshmane <sa...@gmail.com> on 2016/06/09 10:57:38 UTC, 2 replies.
- Spark ML Word2Vec Serialization Issues - posted by sharad82 <kh...@gmail.com> on 2016/06/09 11:21:05 UTC, 0 replies.
- Saving Parquet files to S3 - posted by Ankur Jain <an...@yash.com> on 2016/06/09 11:51:52 UTC, 4 replies.
- HIVE Query 25x faster than SPARK Query - posted by Gourav Sengupta <go...@gmail.com> on 2016/06/09 14:14:04 UTC, 15 replies.
- spark on mesos cluster - metrics with graphite sink - posted by Lior Chaga <li...@taboola.com> on 2016/06/09 14:33:30 UTC, 0 replies.
- JIRA SPARK-2984 - posted by Sunil Kumar <pa...@yahoo.com.INVALID> on 2016/06/09 15:56:27 UTC, 4 replies.
- Spark ML - Is it safe to schedule two trainings job at the same time or will worker state be corrupted? - posted by Brandon White <bw...@gmail.com> on 2016/06/09 17:28:34 UTC, 1 replies.
- Spark Streaming getting slower - posted by John Simon <jo...@tapjoy.com> on 2016/06/09 18:09:47 UTC, 1 replies.
- Re: Processing Time Spikes (Spark Streaming) - posted by "christian.dancuart@rbc.com" <ch...@rbc.com> on 2016/06/09 18:38:45 UTC, 0 replies.
- Re: Spark Streaming heap space out of memory - posted by "christian.dancuart@rbc.com" <ch...@rbc.com> on 2016/06/09 18:43:28 UTC, 0 replies.
- How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL? - posted by SRK <sw...@gmail.com> on 2016/06/09 20:01:45 UTC, 15 replies.
- how to store results of Scala Query in Text format or tab delimiter - posted by Mahender Sarangam <Ma...@outlook.com> on 2016/06/09 20:11:24 UTC, 0 replies.
- Issues when using the streaming checkpoint - posted by Natu Lauchande <nl...@gmail.com> on 2016/06/09 20:57:19 UTC, 0 replies.
- Re: pyspark.GroupedData.agg works incorrectly when one column is aggregated twice? - posted by Davies Liu <da...@databricks.com> on 2016/06/10 00:00:52 UTC, 0 replies.
- Error writing parquet to S3 - posted by Peter Halliday <pj...@cornell.edu> on 2016/06/10 04:00:35 UTC, 1 replies.
- SparkR : glm model - posted by april_ZMQ <mq...@mais.smu.edu.sg> on 2016/06/10 04:14:52 UTC, 1 replies.
- Catalyst optimizer cpu/Io cost - posted by Srinivasan Hariharan02 <Sr...@infosys.com> on 2016/06/10 06:29:34 UTC, 8 replies.
- - posted by pooja mehta <sp...@gmail.com> on 2016/06/10 06:53:10 UTC, 2 replies.
- Spark Installation to work on Spark Streaming and MLlib - posted by Ram Krishna <ra...@gmail.com> on 2016/06/10 07:20:06 UTC, 4 replies.
- Spark Getting data from MongoDB in JAVA - posted by Asfandyar Ashraf Malik <as...@kreditech.com> on 2016/06/10 09:39:48 UTC, 5 replies.
- Kmeans Streaming process flow - posted by "Chandra Mohan, Ananda Vel Murugan" <An...@honeywell.com> on 2016/06/10 11:04:35 UTC, 0 replies.
- IndexedRowMatrix multiplication with a local Sparse Matrix - posted by RuchitaGoyal <go...@gmail.com> on 2016/06/10 13:17:35 UTC, 0 replies.
- Java MongoDB Spark Stratio (Please give me a hint) - posted by Umair Janjua <um...@gmail.com> on 2016/06/10 13:36:05 UTC, 2 replies.
- Re: word2vec: how to save an mllib model and reload it? - posted by sharad82 <kh...@gmail.com> on 2016/06/10 15:23:24 UTC, 0 replies.
- SAS_TO_SPARK_SQL_(Could be a Bug?) - posted by Ajay Chander <it...@gmail.com> on 2016/06/10 16:07:53 UTC, 9 replies.
- Long Running Spark Streaming getting slower - posted by "john.simon" <jo...@tapjoy.com> on 2016/06/10 17:21:51 UTC, 14 replies.
- Cleaning spark memory - posted by Cesar Flores <ce...@gmail.com> on 2016/06/10 18:18:44 UTC, 1 replies.
- Pls assist: Spark DecisionTree question - posted by Marco Mistroni <mm...@gmail.com> on 2016/06/10 18:39:17 UTC, 0 replies.
- DataFrame.foreach(scala.Function1) example - posted by Mohammad Tariq <do...@gmail.com> on 2016/06/10 19:51:40 UTC, 0 replies.
- Updated Spark logo - posted by Matei Zaharia <ma...@gmail.com> on 2016/06/10 20:01:28 UTC, 0 replies.
- Neither previous window has value for key, nor new values found. - posted by Marco1982 <ma...@yahoo.it> on 2016/06/10 20:19:40 UTC, 1 replies.
- Slow collecting of large Spark Data Frames into R - posted by Jonathan Mortensen <jm...@nuna.com.INVALID> on 2016/06/11 00:31:50 UTC, 1 replies.
- Neither previous window has value for key, nor new values found - posted by Marco Platania <ma...@yahoo.it.INVALID> on 2016/06/11 00:34:01 UTC, 0 replies.
- 回复：Cleaning spark memory - posted by Ricky <49...@qq.com> on 2016/06/11 04:17:05 UTC, 0 replies.
- Big Data Interview - posted by Chaturvedi Chola <ch...@gmail.com> on 2016/06/11 06:54:49 UTC, 0 replies.
- Book for Machine Learning (MLIB and other libraries on Spark) - posted by Deepak Goel <de...@gmail.com> on 2016/06/11 15:04:26 UTC, 7 replies.
- Running Spark in Standalone or local modes - posted by Ashok Kumar <as...@yahoo.com.INVALID> on 2016/06/11 18:08:47 UTC, 7 replies.
- Accuracy of BinaryClassificationMetrics - posted by Marco Mistroni <mm...@gmail.com> on 2016/06/11 20:08:42 UTC, 0 replies.
- Should I avoid "state" in an Spark application? - posted by Haopu Wang <HW...@qilinsoft.com> on 2016/06/12 08:40:09 UTC, 2 replies.
- How to use a model generated early in a stage in ML pipelines - posted by Hayri Volkan Agun <vo...@gmail.com> on 2016/06/12 09:02:13 UTC, 0 replies.
- Several questions about how pyspark.ml works - posted by XapaJIaMnu <nh...@gmail.com> on 2016/06/12 10:08:40 UTC, 0 replies.
- OutOfMemoryError - When saving Word2Vec - posted by sharad82 <kh...@gmail.com> on 2016/06/12 10:08:54 UTC, 3 replies.
- Questions about Spark Worker - posted by East Evil <su...@sina.com> on 2016/06/12 13:12:44 UTC, 2 replies.
- What is the interpretation of Cores in Spark doc - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/06/12 19:23:36 UTC, 13 replies.
- Spark Thrift Server in CDH 5.3 - posted by pooja mehta <sp...@gmail.com> on 2016/06/13 04:51:34 UTC, 1 replies.
- Spark Kafka stream processing time increasing gradually - posted by Roshan Singh <si...@gmail.com> on 2016/06/13 06:19:27 UTC, 4 replies.
- Dataframe : Column features must be of type org.apache.spark.mllib.linalg.VectorUDT - posted by Zakaria Hili <za...@gmail.com> on 2016/06/13 07:55:50 UTC, 0 replies.
- Hive 1.0.0 not able to read Spark 1.6.1 parquet output files on EMR 4.7.0 - posted by mayankshete <ma...@yash.com> on 2016/06/13 07:56:43 UTC, 1 replies.
- Spark 2.0.0 : GLM problem - posted by april_ZMQ <mq...@mais.smu.edu.sg> on 2016/06/13 08:14:37 UTC, 2 replies.
- Another problem about parallel computing - posted by hero <su...@sina.com> on 2016/06/13 08:24:18 UTC, 2 replies.
- cluster mode for Python on standalone clusters - posted by Jan Sourek <ja...@performio.cz> on 2016/06/13 09:16:35 UTC, 0 replies.
- cluster mode for Python on standalone cluster - posted by Jan Šourek <ja...@performio.cz> on 2016/06/13 09:24:41 UTC, 1 replies.
- Spark Streamming checkpoint and restoring files from S3 - posted by Natu Lauchande <nl...@gmail.com> on 2016/06/13 09:57:09 UTC, 1 replies.
- Spark 2.0: Unify DataFrames and Datasets question - posted by Arun Patel <ar...@gmail.com> on 2016/06/13 11:01:15 UTC, 5 replies.
- Spark Streaming application failing with Kerboros issue while writing data to HBase - posted by Kamesh <ka...@gmail.com> on 2016/06/13 11:44:34 UTC, 2 replies.
- Kafka Exceptions - posted by Bryan Jeffrey <br...@gmail.com> on 2016/06/13 13:25:20 UTC, 3 replies.
- Basic question. Access MongoDB data in Spark. - posted by Umair Janjua <um...@gmail.com> on 2016/06/13 15:09:02 UTC, 3 replies.
- Re: java.lang.StackOverflowError when calling count() - posted by Anuj <wr...@gmail.com> on 2016/06/13 15:18:41 UTC, 0 replies.
- Computing on each partition/executor with "persistent" data - posted by Jeroen Miller <bl...@gmail.com> on 2016/06/13 17:35:29 UTC, 0 replies.
- Is there a limit on the number of tasks in one job? - posted by "khaled.hammouda" <kh...@kik.com> on 2016/06/13 18:19:30 UTC, 5 replies.
- LegacyAccumulatorWrapper basically requires the Accumulator value to implement equlas() or it will fail on isZero() - posted by Amit Sela <am...@gmail.com> on 2016/06/13 19:15:24 UTC, 2 replies.
- how to investigate skew and DataFrames and RangePartitioner - posted by Peter Halliday <pj...@cornell.edu> on 2016/06/13 20:04:30 UTC, 1 replies.
- Spark Memory Error - Not enough space to cache broadcast - posted by Cassa L <lc...@gmail.com> on 2016/06/13 21:56:32 UTC, 8 replies.
- Suggestions on Lambda Architecture in Spark - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2016/06/13 22:52:52 UTC, 3 replies.
- Re: Limit pyspark.daemon threads - posted by agateaaa <ag...@gmail.com> on 2016/06/14 07:44:43 UTC, 7 replies.
- Spark corrupts text lines - posted by Kristoffer Sjögren <st...@gmail.com> on 2016/06/14 09:24:08 UTC, 10 replies.
- Create external table with partitions using sqlContext.createExternalTable - posted by Patrick Duin <pa...@gmail.com> on 2016/06/14 10:39:48 UTC, 3 replies.
- Spark 2.0 Preview After caching query didn't work and can't kill job. - posted by Chanh Le <gi...@gmail.com> on 2016/06/14 10:45:08 UTC, 4 replies.
- Running streaming applications in Production environment - posted by "Mail.com" <pr...@mail.com> on 2016/06/14 11:37:34 UTC, 0 replies.
- [Spark 2.0.0] Structured Stream on Kafka - posted by andy petrella <an...@gmail.com> on 2016/06/14 14:21:12 UTC, 3 replies.
- RBM in mllib - posted by Roberto Pagliari <ro...@asos.com> on 2016/06/14 14:23:44 UTC, 1 replies.
- MAtcheERROR : STRINGTYPE - posted by pseudo oduesp <ps...@gmail.com> on 2016/06/14 16:54:26 UTC, 1 replies.
- Spark SQL NoSuchMethodException...DriverWrapper.() - posted by Mirko Bernardoni <mi...@ixxus.com> on 2016/06/14 17:12:51 UTC, 3 replies.
- Spark-SQL with Oozie - posted by chandana <ch...@gmail.com> on 2016/06/14 17:14:09 UTC, 1 replies.
- restarting of spark streaming - posted by "Chen, Yan I" <ya...@rbc.com> on 2016/06/14 17:34:13 UTC, 1 replies.
- spark-ec2 scripts with spark-2.0.0-preview - posted by Sunil Kumar <pa...@yahoo.com.INVALID> on 2016/06/14 19:07:12 UTC, 1 replies.
- SparkContext#cancelJobGroup : is it safe? Who got burn? Who is alive? - posted by Bertrand Dechoux <de...@gmail.com> on 2016/06/14 19:13:49 UTC, 0 replies.
- choice of RDD function - posted by Sivakumaran S <si...@me.com> on 2016/06/14 20:29:32 UTC, 10 replies.
- spark standalone High availibilty issues - posted by Darshan Singh <da...@gmail.com> on 2016/06/14 20:56:13 UTC, 2 replies.
- Writing empty Dataframes doesn't save any _metadata files in Spark 1.5.1 and 1.6 - posted by antoniosi <an...@gmail.com> on 2016/06/14 23:46:19 UTC, 2 replies.
- can not show all data for this table - posted by Lee Ho Yeung <jo...@gmail.com> on 2016/06/15 01:19:38 UTC, 5 replies.
- hivecontext error - posted by Tejaswini Buche <te...@gmail.com> on 2016/06/15 01:33:31 UTC, 1 replies.
- streaming example has error - posted by Lee Ho Yeung <jo...@gmail.com> on 2016/06/15 01:34:00 UTC, 3 replies.
- sqlcontext - not able to connect to database - posted by Tejaswini Buche <te...@gmail.com> on 2016/06/15 04:45:16 UTC, 1 replies.
- Spark SQL driver memory keeps rising - posted by Khaled Hammouda <kh...@kik.com> on 2016/06/15 05:22:53 UTC, 4 replies.
- how do I set TBLPROPERTIES in dataFrame.saveAsTable()? - posted by Yang <te...@gmail.com> on 2016/06/15 08:02:52 UTC, 0 replies.
- can spark help to prevent memory error for itertools.combinations(initlist, 2) in python script - posted by Lee Ho Yeung <jo...@gmail.com> on 2016/06/15 10:02:55 UTC, 0 replies.
- Adding h5 files in a zip to use with PySpark - posted by ar7 <as...@gmail.com> on 2016/06/15 10:50:34 UTC, 2 replies.
- Handle empty kafka in Spark Streaming - posted by Yogesh Vyas <in...@gmail.com> on 2016/06/15 11:30:41 UTC, 2 replies.
- Spark 2.0 release date - posted by Chaturvedi Chola <ch...@gmail.com> on 2016/06/15 11:45:45 UTC, 5 replies.
- Substract two DStreams - posted by Matthias Niehoff <ma...@codecentric.de> on 2016/06/15 13:18:08 UTC, 2 replies.
- Is that normal spark performance? - posted by "nikita.dobryukha" <n....@gmail.com> on 2016/06/15 14:01:11 UTC, 2 replies.
- Get both feature importance and ROC curve from a random forest classifier - posted by matd <ma...@gmail.com> on 2016/06/15 14:13:10 UTC, 0 replies.
- vecotors inside columns - posted by pseudo oduesp <ps...@gmail.com> on 2016/06/15 15:18:36 UTC, 0 replies.
- processing 50 gb data using just one machine - posted by spR <da...@gmail.com> on 2016/06/15 16:03:26 UTC, 6 replies.
- update mysql in spark - posted by spR <da...@gmail.com> on 2016/06/15 16:08:38 UTC, 1 replies.
- Fwd: ERROR RetryingBlockFetcher: Exception while beginning fetch of 1 outstanding blocks - posted by VG <vl...@gmail.com> on 2016/06/15 17:12:18 UTC, 1 replies.
- concat spark dataframes - posted by spR <da...@gmail.com> on 2016/06/15 18:57:50 UTC, 4 replies.
- Re: IllegalArgumentException UnsatisfiedLinkError snappy-1.1.2 spark-shell error - posted by Arul Ramachandran <ar...@gmail.com> on 2016/06/15 19:40:37 UTC, 0 replies.
- data too long - posted by spR <da...@gmail.com> on 2016/06/15 19:40:38 UTC, 0 replies.
- Reporting warnings from workers - posted by Mathieu Longtin <ma...@closetwork.org> on 2016/06/15 20:24:26 UTC, 2 replies.
- ERROR TaskResultGetter: Exception while getting task result java.io.IOException: java.lang.ClassNotFoundException: scala.Some - posted by S Sarkar <ss...@gmail.com> on 2016/06/15 20:25:58 UTC, 1 replies.
- [ANNOUNCE] Apache SystemML 0.10.0-incubating released - posted by Luciano Resende <lr...@apache.org> on 2016/06/15 23:18:45 UTC, 0 replies.
- java server error - spark - posted by spR <da...@gmail.com> on 2016/06/15 23:24:41 UTC, 10 replies.
- Error Running SparkPi.scala Example - posted by Krishna Kalyan <kr...@gmail.com> on 2016/06/15 23:37:00 UTC, 2 replies.
- GraphX performance and settings - posted by Maja Kabiljo <ma...@fb.com> on 2016/06/16 01:01:14 UTC, 2 replies.
- Is that possible to feed web request via spark application directly? - posted by Yu Wei <yu...@hotmail.com> on 2016/06/16 03:19:22 UTC, 1 replies.
- How to deal with tasks running too long? - posted by Utkarsh Sengar <ut...@gmail.com> on 2016/06/16 04:45:32 UTC, 3 replies.
- spark streaming application - deployment best practices - posted by vimal dinakaran <vi...@gmail.com> on 2016/06/16 05:59:19 UTC, 0 replies.
- Anyone has used Apache nifi - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/06/16 09:10:44 UTC, 1 replies.
- Spark cache behaviour when the source table is modified - posted by Anjali Chadha <an...@gmail.com> on 2016/06/16 09:24:42 UTC, 0 replies.
- STringindexer - posted by pseudo oduesp <ps...@gmail.com> on 2016/06/16 09:31:19 UTC, 0 replies.
- Can I control the execution of Spark jobs? - posted by Haopu Wang <HW...@qilinsoft.com> on 2016/06/16 09:36:18 UTC, 4 replies.
- String indexer - posted by pseudo oduesp <ps...@gmail.com> on 2016/06/16 10:06:29 UTC, 0 replies.
- Unable to execute sparkr jobs through Chronos - posted by Rodrick Brown <ro...@orchard-app.com> on 2016/06/16 10:57:47 UTC, 2 replies.
- In yarn-cluster mode, provide system prop to the client jvm - posted by "Ellis, Tom (Financial Markets IT)" <To...@LloydsBanking.com.INVALID> on 2016/06/16 11:02:33 UTC, 1 replies.
- how to load compressed (gzip) csv file using spark-csv - posted by Vamsi Krishna <va...@gmail.com> on 2016/06/16 11:27:39 UTC, 2 replies.
- [YARN] Questions about YARN's queues and Spark's FAIR scheduler - posted by Jacek Laskowski <ja...@japila.pl> on 2016/06/16 11:37:44 UTC, 4 replies.
- Spark crashes worker nodes with multiple application starts - posted by "Carlile, Ken" <ca...@janelia.hhmi.org> on 2016/06/16 11:40:51 UTC, 3 replies.
- cache datframe - posted by pseudo oduesp <ps...@gmail.com> on 2016/06/16 12:17:21 UTC, 2 replies.
- Re: [scala-user] ERROR TaskResultGetter: Exception while getting task result java.io.IOException: java.lang.ClassNotFoundException: scala.Some - posted by Oliver Ruebenacker <cu...@gmail.com> on 2016/06/16 12:26:27 UTC, 0 replies.
- Kerberos setup in Apache spark connecting to remote HDFS/Yarn - posted by akhandeshi <am...@gmail.com> on 2016/06/16 13:32:17 UTC, 5 replies.
- advise please - posted by pseudo oduesp <ps...@gmail.com> on 2016/06/16 13:42:28 UTC, 1 replies.
- difference between dataframe and dataframwrite - posted by pseudo oduesp <ps...@gmail.com> on 2016/06/16 14:43:02 UTC, 3 replies.
- Unsubscribe - posted by Sanjeev Sagar <sa...@mypointscorp.com> on 2016/06/16 16:06:03 UTC, 0 replies.
- converting timestamp from UTC to many time zones - posted by ericjhilton <er...@gmail.com> on 2016/06/16 16:16:29 UTC, 1 replies.
- Recommended way to push data into HBase through Spark streaming - posted by Mohammad Tariq <do...@gmail.com> on 2016/06/16 16:42:38 UTC, 1 replies.
- spark sql broadcast join ? - posted by "Kali.tummala@gmail.com" <Ka...@gmail.com> on 2016/06/16 19:05:10 UTC, 1 replies.
- Spark jobs without a login - posted by jay vyas <ja...@gmail.com> on 2016/06/16 20:32:53 UTC, 1 replies.
- Update Batch DF with Streaming - posted by Amit Assudani <aa...@impetus.com> on 2016/06/16 22:11:45 UTC, 2 replies.
- Spark Streaming WAL issue**: File exists and there is no append support! - posted by "tosaiganesh@gmail.com" <to...@gmail.com> on 2016/06/16 22:50:37 UTC, 0 replies.
- Skew data - posted by Selvam Raman <se...@gmail.com> on 2016/06/17 00:55:26 UTC, 1 replies.
- sparkR.init() can not load sparkPackages. - posted by Joseph <wx...@sina.com> on 2016/06/17 02:46:36 UTC, 1 replies.
- test - what is the wrong while adding one column in the dataframe - posted by Zhiliang Zhu <zc...@yahoo.com.INVALID> on 2016/06/17 04:15:06 UTC, 1 replies.
- spark job killed without rhyme or reason - posted by Zhiliang Zhu <zc...@yahoo.com.INVALID> on 2016/06/17 06:09:45 UTC, 0 replies.
- Re: spark job automatically killed without rhyme or reason - posted by Zhiliang Zhu <zc...@yahoo.com.INVALID> on 2016/06/17 06:33:27 UTC, 11 replies.
- Stringindexers on multiple columns >1000 - posted by pseudo oduesp <ps...@gmail.com> on 2016/06/17 07:39:58 UTC, 0 replies.
- update data frame inside function - posted by pseudo oduesp <ps...@gmail.com> on 2016/06/17 08:25:42 UTC, 0 replies.
- Custom DataFrame filter - posted by Леонид Поляков <ow...@gmail.com> on 2016/06/17 08:45:30 UTC, 0 replies.
- binding two data frame - posted by pseudo oduesp <ps...@gmail.com> on 2016/06/17 09:04:17 UTC, 0 replies.
- java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org - posted by VG <vl...@gmail.com> on 2016/06/17 09:08:08 UTC, 16 replies.
- spark-xml - xml parsing when rows only have attributes - posted by VG <vl...@gmail.com> on 2016/06/17 12:19:43 UTC, 2 replies.
- Spark UI shows finished when job had an error - posted by Sumona Routh <su...@gmail.com> on 2016/06/17 13:49:33 UTC, 3 replies.
- Running Java Implementationof StreamingKmeans - posted by Biplob Biswas <re...@gmail.com> on 2016/06/17 15:39:33 UTC, 1 replies.
- Best way to go from RDD to DataFrame of StringType columns - posted by Everett Anderson <ev...@nuna.com.INVALID> on 2016/06/17 19:38:04 UTC, 5 replies.
- YARN Application Timeline service with Spark 2.0.0 issue - posted by Saisai Shao <sa...@gmail.com> on 2016/06/17 20:29:51 UTC, 0 replies.
- Running JavaBased Implementationof StreamingKmeans - posted by Biplob Biswas <re...@gmail.com> on 2016/06/17 20:41:27 UTC, 6 replies.
- Data Integrity / Model Quality Monitoring - posted by Benjamin Kim <bb...@gmail.com> on 2016/06/17 21:35:55 UTC, 0 replies.
- Dataset Select Function after Aggregate Error - posted by Pedro Rodriguez <sk...@gmail.com> on 2016/06/17 22:33:30 UTC, 9 replies.
- Spark 2.0 preview - How to configure warehouse for Catalyst? always pointing to /user/hive/warehouse - posted by Andrew Lee <al...@hotmail.com> on 2016/06/17 23:34:11 UTC, 0 replies.
- Spark 2.0 on YARN - Files in config archive not ending up on executor classpath - posted by Jonathan Kelly <jo...@gmail.com> on 2016/06/18 01:36:54 UTC, 5 replies.
- Python to Scala - posted by Aakash Basu <ra...@gmail.com> on 2016/06/18 04:34:37 UTC, 8 replies.
- Making spark read from sources other than HDFS - posted by Ramprakash Ramamoorthy <yo...@gmail.com> on 2016/06/18 09:52:51 UTC, 1 replies.
- How to cause a stage to fail (using spark-shell)? - posted by Jacek Laskowski <ja...@japila.pl> on 2016/06/18 09:53:29 UTC, 5 replies.
- Many executors with the same ID in web UI (under Executors)? - posted by Jacek Laskowski <ja...@japila.pl> on 2016/06/18 14:04:55 UTC, 7 replies.
- What does it mean when a executor has negative active tasks? - posted by Brandon White <bw...@gmail.com> on 2016/06/18 16:50:04 UTC, 2 replies.
- Running JavaBased Implementation of StreamingKmeans - posted by Biplob Biswas <re...@gmail.com> on 2016/06/18 17:30:52 UTC, 0 replies.
- CfP for Spark Summit Brussels, 2016 - posted by Jules Damji <dm...@comcast.net> on 2016/06/18 17:56:51 UTC, 0 replies.
- unsubscribe error - posted by Marco Platania <ma...@yahoo.it.INVALID> on 2016/06/18 19:04:09 UTC, 1 replies.
- Spark not using all the cluster instances in AWS EMR - posted by Natu Lauchande <nl...@gmail.com> on 2016/06/18 19:17:27 UTC, 1 replies.
- Re: Improving performance of a kafka spark streaming app - posted by Colin Kincaid Williams <di...@uw.edu> on 2016/06/18 19:40:25 UTC, 14 replies.
- Re: Creating tables for JSON data - posted by brendan kehoe <br...@gmail.com> on 2016/06/19 00:24:59 UTC, 0 replies.
- spark streaming - how to purge old data files in data directory - posted by Vamsi Krishna <va...@gmail.com> on 2016/06/19 00:28:59 UTC, 1 replies.
- Thanks For a Job Well Done !!! - posted by Krishna Sankar <ks...@gmail.com> on 2016/06/19 02:20:19 UTC, 1 replies.
- Are ser/de optimizations relevant with Dataset API and Encoders ? - posted by Amit Sela <am...@gmail.com> on 2016/06/19 06:55:02 UTC, 0 replies.
- Running Spark in local mode - posted by Ashok Kumar <as...@yahoo.com.INVALID> on 2016/06/19 08:39:51 UTC, 9 replies.
- plot importante variable in pyspark - posted by pseudo oduesp <ps...@gmail.com> on 2016/06/19 08:44:58 UTC, 0 replies.
- Accessing system environment on Spark Worker - posted by Mohamed Taher AlRefaie <m....@msn.com> on 2016/06/19 11:46:35 UTC, 1 replies.
- Spark - “min key = null, max key = null” while reading ORC file - posted by Mohanraj Ragupathiraj <mo...@gmail.com> on 2016/06/20 04:01:42 UTC, 6 replies.
- Unable to acquire bytes of memory - posted by pseudo oduesp <ps...@gmail.com> on 2016/06/20 09:27:51 UTC, 1 replies.
- tensor factorization FR - posted by Roberto Pagliari <ro...@asos.com> on 2016/06/20 10:20:53 UTC, 0 replies.
- JDBC load into tempTable - posted by Ashok Kumar <as...@yahoo.com.INVALID> on 2016/06/20 10:41:09 UTC, 2 replies.
- Beeline exception when connecting to Spark 2.0 ThriftServer running on yarn - posted by Lei Lei2 Gu <gu...@lenovo.com> on 2016/06/20 10:49:12 UTC, 0 replies.
- Is it possible to turn a SortMergeJoin into BroadcastHashJoin? - posted by 梅西0247 <zh...@dtdream.com> on 2016/06/20 11:06:41 UTC, 4 replies.
- Verifying if DStream is empty - posted by Praseetha <pr...@gmail.com> on 2016/06/20 12:30:05 UTC, 3 replies.
- dense_rank skips ranks on cube - posted by talgr <ta...@gmail.com> on 2016/06/20 14:00:28 UTC, 0 replies.
- Underutilized Cluster - posted by Chadha Pooja <Ch...@bcg.com> on 2016/06/20 15:01:31 UTC, 0 replies.
- Data Generators mllib -> ml - posted by Stephen Boesch <ja...@gmail.com> on 2016/06/20 16:29:26 UTC, 0 replies.
- Saving data using tempTable versus save() method - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/06/20 20:35:02 UTC, 5 replies.
- Re: SparkSQL issue: Spark 1.3.1 + hadoop 2.6 on CDH5.3 with parquet - posted by Satya <sa...@seagate.com> on 2016/06/20 21:19:29 UTC, 0 replies.
- javax.net.ssl.SSLHandshakeException: unable to find valid certification path to requested target - posted by Utkarsh Sengar <ut...@gmail.com> on 2016/06/20 23:03:12 UTC, 1 replies.
- Build Spark 2.0 succeeded but could not run it on YARN - posted by wgtmac <us...@gmail.com> on 2016/06/21 01:18:24 UTC, 3 replies.
- Notebook(s) for Spark 2.0 ? - posted by Stephen Boesch <ja...@gmail.com> on 2016/06/21 02:06:40 UTC, 0 replies.
- 回复：Is it possible to turn a SortMergeJoin into BroadcastHashJoin? - posted by 梅西0247 <zh...@dtdream.com> on 2016/06/21 02:31:18 UTC, 1 replies.
- scala.NotImplementedError: put() should not be called on an EmptyStateMap while doing stateful computation on spark streaming - posted by umanga <bi...@gmail.com> on 2016/06/21 06:16:41 UTC, 2 replies.
- Running spark executor process with username in standalone mode - posted by Florian Philippon <fl...@gmail.com> on 2016/06/21 07:25:17 UTC, 0 replies.
- read.parquet or read.load - posted by pseudo oduesp <ps...@gmail.com> on 2016/06/21 07:31:24 UTC, 0 replies.
- cast only some columns - posted by pseudo oduesp <ps...@gmail.com> on 2016/06/21 11:16:38 UTC, 1 replies.
- [Spark + MLlib] How to prevent negative values in Linear regression? - posted by diplomatic Guru <di...@gmail.com> on 2016/06/21 12:38:41 UTC, 3 replies.
- FullOuterJoin on Spark - posted by "Rychnovsky, Dusan" <Du...@firma.seznam.cz> on 2016/06/21 13:16:56 UTC, 2 replies.
- Union of multiple RDDs - posted by Apurva Nandan <ap...@gmail.com> on 2016/06/21 13:48:19 UTC, 2 replies.
- Number of consumers in Kafka with Spark Streaming - posted by Guillermo Ortiz <ko...@gmail.com> on 2016/06/21 14:56:53 UTC, 1 replies.
- ) - posted by pseudo oduesp <ps...@gmail.com> on 2016/06/21 15:02:01 UTC, 0 replies.
- Can Spark Streaming checkpoint only metadata ? - posted by Natu Lauchande <nl...@gmail.com> on 2016/06/21 15:39:23 UTC, 0 replies.
- Labeledpoint - posted by pseudo oduesp <ps...@gmail.com> on 2016/06/21 16:12:31 UTC, 2 replies.
- Does saveAsHadoopFile depend on master? - posted by Pierre Villard <pi...@gmail.com> on 2016/06/21 18:03:52 UTC, 3 replies.
- How to do some pre-processing of the SQL in the Thrift server? - posted by Timothy Potter <th...@gmail.com> on 2016/06/21 18:07:17 UTC, 0 replies.
- Spark-Cassandra connector - posted by Joaquin Alzola <Jo...@lebara.com> on 2016/06/21 22:41:39 UTC, 0 replies.
- Silly question about Yarn client vs Yarn cluster modes... - posted by Michael Segel <ms...@hotmail.com> on 2016/06/21 23:58:58 UTC, 14 replies.
- Getting a DataFrame back as result from SparkIMain - posted by Jayant Shekhar <ja...@gmail.com> on 2016/06/22 00:39:20 UTC, 0 replies.
- Fwd: 'numBins' property not honoured in BinaryClassificationMetrics class when spark.default.parallelism is not set to 1 - posted by Sneha Shukla <sn...@gmail.com> on 2016/06/22 04:39:56 UTC, 1 replies.
- feture importance or variable importance - posted by pseudo oduesp <ps...@gmail.com> on 2016/06/22 05:14:19 UTC, 0 replies.
- Spark 1.5.2 - Different results from reduceByKey over multiple iterations - posted by Nirav Patel <np...@xactlycorp.com> on 2016/06/22 05:42:23 UTC, 5 replies.
- Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher - posted by 另一片天 <95...@qq.com> on 2016/06/22 05:59:19 UTC, 0 replies.
- Re: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher - posted by Yash Sharma <ya...@gmail.com> on 2016/06/22 06:04:54 UTC, 9 replies.
- 回复： Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher - posted by 另一片天 <95...@qq.com> on 2016/06/22 06:10:20 UTC, 9 replies.
- how to avoid duplicate messages with spark streaming using checkpoint after restart in case of failure - posted by sandesh deshmane <sa...@gmail.com> on 2016/06/22 07:09:45 UTC, 11 replies.
- Spark Task failure with File segment length as negative - posted by Priya Ch <le...@gmail.com> on 2016/06/22 11:09:17 UTC, 0 replies.
- Can I use log4j2.xml in my Apache Saprk application - posted by Charan Adabala <ch...@gmail.com> on 2016/06/22 11:11:36 UTC, 1 replies.
- spark-1.6.1-bin-without-hadoop can not use spark-sql - posted by 喜之郎 <25...@qq.com> on 2016/06/22 11:38:51 UTC, 3 replies.
- Running JavaBased Implementation of StreamingKmeans Spark - posted by Biplob Biswas <re...@gmail.com> on 2016/06/22 12:39:56 UTC, 4 replies.
- [Spark + MLlib] how to update offline model with the online model - posted by diplomatic Guru <di...@gmail.com> on 2016/06/22 13:13:45 UTC, 0 replies.
- spark streaming questions - posted by pandees waran <pa...@gmail.com> on 2016/06/22 14:53:15 UTC, 3 replies.
- Unable to increase Active Tasks of a Spark Streaming Process in Yarn - posted by Rachana Srivastava <Ra...@markmonitor.com> on 2016/06/22 15:43:36 UTC, 2 replies.
- Confusing argument of sql.functions.count - posted by Jakub Dubovsky <sp...@gmail.com> on 2016/06/22 16:06:57 UTC, 6 replies.
- OOM on the driver after increasing partitions - posted by Raghava Mutharaju <m....@gmail.com> on 2016/06/22 16:27:36 UTC, 6 replies.
- Networking Exceptions in Spark 1.6.1 with Dynamic Allocation and YARN Pre-Emption - posted by Nick Peterson <nr...@gmail.com> on 2016/06/22 18:28:50 UTC, 0 replies.
- Executors killed in Workers with Error: invalid log directory - posted by Yiannis Gkoufas <jo...@gmail.com> on 2016/06/22 20:03:10 UTC, 0 replies.
- Recovery techniques for Spark Streaming scheduling delay - posted by "C. Josephson" <cj...@uhana.io> on 2016/06/22 21:58:57 UTC, 0 replies.
- Explode row with start and end dates into row for each date - posted by John Aherne <jo...@justenough.com> on 2016/06/22 22:20:46 UTC, 3 replies.
- Re: Shuffle service fails to register driver - Spark - Mesos - posted by "Feller, Eugen" <eu...@verizon.com> on 2016/06/22 23:17:46 UTC, 0 replies.
- Re: Spark ml and PMML export - posted by jayantshekhar <ja...@gmail.com> on 2016/06/23 00:37:27 UTC, 4 replies.
- Creating a python port for a Scala Spark Projeect - posted by Daniel Imberman <da...@gmail.com> on 2016/06/23 02:07:59 UTC, 2 replies.
- 回复： spark-1.6.1-bin-without-hadoop can not use spark-sql - posted by 喜之郎 <25...@qq.com> on 2016/06/23 02:10:04 UTC, 0 replies.
- Building Spark 2.X in Intellij - posted by Stephen Boesch <ja...@gmail.com> on 2016/06/23 02:25:18 UTC, 6 replies.
- why did spark2.0 Disallow ROW FORMAT and STORED AS (parquet | orc | avro etc.) - posted by linxi zeng <li...@gmail.com> on 2016/06/23 03:10:59 UTC, 0 replies.
- NullPointerException when starting StreamingContext - posted by Sunita Arvind <su...@gmail.com> on 2016/06/23 03:20:32 UTC, 4 replies.
- How does Spark Streaming updateStateByKey or mapWithState scale with state size? - posted by Martin Eden <ma...@gmail.com> on 2016/06/23 07:37:03 UTC, 0 replies.
- categoricalFeaturesInfo - posted by pseudo oduesp <ps...@gmail.com> on 2016/06/23 08:15:44 UTC, 2 replies.
- LogisticRegression.scala ERROR, require(Predef.scala) - posted by Ascot Moss <as...@gmail.com> on 2016/06/23 11:30:21 UTC, 1 replies.
- Performance issue with spark ml model to make single predictions on server side - posted by philippe v <gl...@gmail.com> on 2016/06/23 11:40:26 UTC, 1 replies.
- Confusion regarding sc.accumulableCollection(mutable.ArrayBuffer[String]()) type - posted by Daniel Haviv <da...@veracity-group.com> on 2016/06/23 12:04:25 UTC, 0 replies.
- Spark Thrift Server Concurrency - posted by Prabhu Joseph <pr...@gmail.com> on 2016/06/23 12:21:52 UTC, 3 replies.
- Change from distributed.MatrixEntry to Vector - posted by Pasquinell Urbani <pa...@exalitica.com> on 2016/06/23 13:12:52 UTC, 0 replies.
- Ideas to put a Spark ML model in production - posted by Saurabh Sardeshpande <sa...@gmail.com> on 2016/06/23 17:54:27 UTC, 0 replies.
- Data Frames Join by more than one column - posted by Andrés Ivaldi <ia...@gmail.com> on 2016/06/23 17:57:22 UTC, 1 replies.
- Option Encoder - posted by Richard Marscher <rm...@localytics.com> on 2016/06/23 18:16:37 UTC, 1 replies.
- Multiple compute nodes in standalone mode - posted by avendaon <jn...@sharcnet.ca> on 2016/06/23 19:28:53 UTC, 1 replies.
- Kryo ClassCastException during Serialization/deserialization in Spark Streaming - posted by SRK <sw...@gmail.com> on 2016/06/23 19:34:18 UTC, 2 replies.
- Confusion about spark.shuffle.memoryFraction and spark.storage.memoryFraction - posted by Darshan Singh <da...@gmail.com> on 2016/06/23 20:40:14 UTC, 0 replies.
- Partitioning in spark - posted by Darshan Singh <da...@gmail.com> on 2016/06/23 20:46:35 UTC, 2 replies.
- destroyPythonWorker job in PySpark - posted by Krishna <re...@gmail.com> on 2016/06/23 21:59:48 UTC, 0 replies.
- Custom Optimizer - posted by Stephen Boesch <ja...@gmail.com> on 2016/06/23 23:56:56 UTC, 0 replies.
- RDD, Dataframe and Parquet order - posted by tuxx <he...@outlook.com> on 2016/06/24 00:26:59 UTC, 0 replies.
- Spark SQL Hive Authorization - posted by rmenon <rm...@ea.com> on 2016/06/24 00:40:19 UTC, 0 replies.
- Databricks' 2016 Survey on Apache Spark - posted by Jules Damji <dm...@comcast.net> on 2016/06/24 01:45:47 UTC, 0 replies.
- Cost of converting RDD's to dataframe and back - posted by pan <pr...@gmail.com> on 2016/06/24 06:00:18 UTC, 6 replies.
- Error Invoking Spark on Yarn on using Spark Submit - posted by puneet kumar <pu...@gmail.com> on 2016/06/24 07:14:30 UTC, 3 replies.
- How to write the DataFrame results back to HDFS with other then \n as record separator - posted by Radha krishna <gr...@gmail.com> on 2016/06/24 09:03:09 UTC, 2 replies.
- problem running spark with yarn-client not using spark-submit - posted by sy...@tsmc.com on 2016/06/24 11:01:36 UTC, 3 replies.
- How to convert a Random Forest model built in R to a similar model in Spark - posted by Neha Mehta <ne...@gmail.com> on 2016/06/24 11:40:54 UTC, 2 replies.
- Spark Xml schema help - posted by Nandan Thakur <na...@gmail.com> on 2016/06/24 12:17:30 UTC, 0 replies.
- Logging trait in Spark 2.0 - posted by Paolo Patierno <pp...@live.com> on 2016/06/24 13:07:19 UTC, 5 replies.
- streaming on yarn - posted by Alex Dzhagriev <dz...@gmail.com> on 2016/06/24 13:29:54 UTC, 0 replies.
- DataFrame versus Dataset creation and usage - posted by Martin Serrano <ma...@attivio.com> on 2016/06/24 14:27:53 UTC, 4 replies.
- Spark connecting to Hive in another EMR cluster - posted by Dave Maughan <da...@gmail.com> on 2016/06/24 16:05:13 UTC, 0 replies.
- How can I use pyspark.ml.evaluation.BinaryClassificationEvaluator with point predictions instead of confidence intervals? - posted by apu <ap...@gmail.com> on 2016/06/24 17:42:15 UTC, 1 replies.
- Model Quality Tracking - posted by Benjamin Kim <bb...@gmail.com> on 2016/06/24 21:01:47 UTC, 0 replies.
- Batch details are missing - posted by "C. Josephson" <cj...@uhana.io> on 2016/06/24 21:57:54 UTC, 0 replies.
- Spark 2.0 Continuous Processing - posted by kmat <ku...@hotmail.com> on 2016/06/24 23:06:57 UTC, 0 replies.
- Poor performance of using spark sql over gzipped json files - posted by Shuai Lin <li...@gmail.com> on 2016/06/25 00:05:07 UTC, 0 replies.
- spark-sql jdbc dataframe mysql data type issue - posted by 刘虓 <ip...@gmail.com> on 2016/06/25 12:36:10 UTC, 1 replies.
- Streaming and Batch code sharing - posted by Nikhil Goyal <no...@gmail.com> on 2016/06/26 03:24:24 UTC, 0 replies.
- Spark Task is not created - posted by Ravindra <ra...@gmail.com> on 2016/06/26 04:21:23 UTC, 2 replies.
- Aggregator (Spark 2.0) skips aggregation is zero(0 returns null - posted by Amit Sela <am...@gmail.com> on 2016/06/26 09:06:43 UTC, 7 replies.
- Running of Continuous Aggregation example - posted by Chang Lim <ch...@gmail.com> on 2016/06/26 09:45:12 UTC, 0 replies.
- add multiple columns - posted by pseudo oduesp <ps...@gmail.com> on 2016/06/26 12:20:24 UTC, 2 replies.
- alter table with hive context - posted by pseudo oduesp <ps...@gmail.com> on 2016/06/26 12:34:25 UTC, 1 replies.
- What is the explanation of "ConvertToUnsafe" in "Physical Plan" - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/06/26 20:11:49 UTC, 1 replies.
- Spark 1.6.1: Unexpected partition behavior? - posted by Randy Gelhausen <rg...@gmail.com> on 2016/06/26 22:34:22 UTC, 1 replies.
- Difference between Dataframe and RDD Persisting - posted by Brandon White <bw...@gmail.com> on 2016/06/27 05:54:53 UTC, 1 replies.
- Last() Window Function - posted by Anton Okolnychyi <an...@gmail.com> on 2016/06/27 08:50:02 UTC, 0 replies.
- [Spark 1.6.1] Beeline cannot start on Windows7 - posted by Haopu Wang <HW...@qilinsoft.com> on 2016/06/27 09:17:39 UTC, 0 replies.
- Querying Hive tables from Spark - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/06/27 09:19:48 UTC, 0 replies.
- Spark SQL poor join performance - posted by vegass <ag...@yahoo.com> on 2016/06/27 09:48:20 UTC, 1 replies.
- GraphX :Running on a Cluster - posted by isaranto <sa...@aueb.gr> on 2016/06/27 12:02:23 UTC, 0 replies.
- Spark partition formula on standalone mode? - posted by "Kali.tummala@gmail.com" <Ka...@gmail.com> on 2016/06/27 13:42:08 UTC, 0 replies.
- Utils and Logging cannot be accessed in package .... - posted by Paolo Patierno <pp...@live.com> on 2016/06/27 15:20:54 UTC, 2 replies.
- run spark sql with script transformation faild - posted by linxi zeng <li...@gmail.com> on 2016/06/27 15:30:44 UTC, 0 replies.
- Arrays in Datasets (1.6.1) - posted by Daniel Imberman <da...@gmail.com> on 2016/06/27 15:50:35 UTC, 1 replies.
- Best practice for handing tables between pipeline components - posted by Everett Anderson <ev...@nuna.com.INVALID> on 2016/06/27 16:14:28 UTC, 4 replies.
- Spark ML - Java implementation of custom Transformer - posted by Mehdi Meziane <me...@ldmobile.net> on 2016/06/27 18:57:39 UTC, 0 replies.
- MapWithState would not restore from checkpoint. - posted by Sergey Zelvenskiy <se...@actions.im> on 2016/06/27 19:09:43 UTC, 0 replies.
- JavaSparkContext: dependency on ui/ - posted by jay vyas <ja...@gmail.com> on 2016/06/27 21:02:58 UTC, 0 replies.
- Running into issue using SparkIMain - posted by Jayant Shekhar <ja...@gmail.com> on 2016/06/27 22:19:01 UTC, 2 replies.
- Best way to tranform string label into long label for classification problem - posted by Jaonary Rabarisoa <ja...@gmail.com> on 2016/06/28 07:29:26 UTC, 2 replies.
- Create JavaRDD from list in Spark 2.0 - posted by Rafael Caballero <ra...@ucm.es> on 2016/06/28 09:58:47 UTC, 0 replies.
- Spark master shuts down when one of zookeeper dies - posted by vimal dinakaran <vi...@gmail.com> on 2016/06/28 12:52:46 UTC, 3 replies.
- Re: Restart App and consume from checkpoint using direct kafka API - posted by vimal dinakaran <vi...@gmail.com> on 2016/06/28 12:54:34 UTC, 0 replies.
- Issue with Spark on 25 nodes cluster - posted by ANDREA SPINA <74...@studenti.unimore.it> on 2016/06/28 13:04:19 UTC, 0 replies.
- Set the node the spark driver will be started - posted by adaman79 <fe...@codecentric.de> on 2016/06/28 14:27:37 UTC, 8 replies.
- Modify the functioning of zipWithIndex function for RDDs - posted by Punit Naik <na...@gmail.com> on 2016/06/28 17:31:04 UTC, 4 replies.
- Need help with spark GraphiteSink - posted by Vijay Vangapandu <Vi...@eharmony.com.INVALID> on 2016/06/28 18:53:30 UTC, 0 replies.
- Random Forest Classification - posted by Rich Tarro <ri...@gmail.com> on 2016/06/28 20:21:43 UTC, 3 replies.
- Integration tests for Spark Streaming - posted by SRK <sw...@gmail.com> on 2016/06/28 20:25:51 UTC, 1 replies.
- Joining a compressed ORC table with a non compressed text table - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/06/28 21:53:06 UTC, 14 replies.
- Spark SQL concurrent runs fails with java.util.concurrent.TimeoutException: Futures timed out after [300 seconds] - posted by Jesse F Chen <jf...@us.ibm.com> on 2016/06/28 22:35:33 UTC, 0 replies.
- Driver zombie process (standalone cluster) - posted by Tomer Benyamini <to...@gmail.com> on 2016/06/29 07:05:36 UTC, 0 replies.
- Job aborted due to not serializable exception - posted by Paolo Patierno <pp...@live.com> on 2016/06/29 08:19:04 UTC, 0 replies.
- Spark sql dataframe - posted by pooja mehta <sp...@gmail.com> on 2016/06/29 09:50:45 UTC, 0 replies.
- Metadata for the StructField - posted by Ted Yu <yu...@gmail.com> on 2016/06/29 10:01:06 UTC, 0 replies.
- Do tasks from the same application run in different JVMs - posted by Huang Meilong <im...@outlook.com> on 2016/06/29 12:47:24 UTC, 1 replies.
- Spark jobs - posted by Joaquin Alzola <Jo...@lebara.com> on 2016/06/29 12:58:26 UTC, 1 replies.
- Spark RDD aggregate action behaves strangely - posted by Kaiyin Zhong <ki...@gmail.com> on 2016/06/29 13:28:11 UTC, 0 replies.
- Can Spark Dataframes preserve order when joining? - posted by Jestin Ma <je...@gmail.com> on 2016/06/29 13:32:37 UTC, 2 replies.
- Using R code as part of a Spark Application - posted by Gilad Landau <Gi...@clicktale.com> on 2016/06/29 13:40:29 UTC, 11 replies.
- Possible to broadcast a function? - posted by Aaron Perrin <ap...@timerazor.com> on 2016/06/29 14:00:07 UTC, 7 replies.
- Unsubscribe - 3rd time - posted by Steve Florence <sf...@ypm.com> on 2016/06/29 15:46:30 UTC, 5 replies.
- groupBy cannot handle large RDDs - posted by Kaiyin Zhong <ki...@gmail.com> on 2016/06/29 17:38:35 UTC, 0 replies.
- Apache Spark Is Hanging when fetch data from SQL Server 2008 - posted by Gastón Schabas <ga...@batangamedia.com> on 2016/06/29 17:53:20 UTC, 0 replies.
- Kudu Connector - posted by Benjamin Kim <bb...@gmail.com> on 2016/06/29 18:27:09 UTC, 0 replies.
- Friendly Reminder: Spark Summit EU CfP Deadline July 1, 2016 - posted by Jules Damji <dm...@comcast.net> on 2016/06/29 18:50:22 UTC, 0 replies.
- Error report file is deleted automatically after spark application finished - posted by prateek arora <pr...@gmail.com> on 2016/06/29 21:47:35 UTC, 0 replies.
- PySpark crashed because "remote RPC client disassociated" - posted by "jw.cmu" <ji...@gmail.com> on 2016/06/29 21:52:37 UTC, 0 replies.
- Regarding Decision Tree - posted by Chintan Bhatt <ch...@charusat.ac.in> on 2016/06/30 04:16:01 UTC, 0 replies.
- deploy-mode flag in spark-sql cli - posted by Huang Meilong <im...@outlook.com> on 2016/06/30 04:16:29 UTC, 3 replies.
- Re: Error report file is deleted automatically after spark application finished - posted by dhruve ashar <dh...@gmail.com> on 2016/06/30 04:30:27 UTC, 2 replies.
- Change spark dataframe to LabeledPoint in Java - posted by Abhishek Anand <ab...@gmail.com> on 2016/06/30 07:29:33 UTC, 0 replies.
- How to use scala.tools.nsc.interpreter.IMain in Spark, just like calling eval in Perl. - posted by Fanchao Meng <fa...@hotmail.com> on 2016/06/30 07:34:17 UTC, 1 replies.
- 答复: deploy-mode flag in spark-sql cli - posted by Huang Meilong <im...@outlook.com> on 2016/06/30 07:49:06 UTC, 0 replies.
- One map per folder in spark or Hadoop - posted by "Balachandar R.A." <ba...@gmail.com> on 2016/06/30 08:42:50 UTC, 0 replies.
- Remote RPC client disassociated - posted by Joaquin Alzola <Jo...@lebara.com> on 2016/06/30 11:34:05 UTC, 2 replies.
- how to add a column according to an existing column of a dataframe? - posted by lu...@sina.com on 2016/06/30 13:08:58 UTC, 1 replies.
- Call Scala API from PySpark - posted by Pedro Rodriguez <sk...@gmail.com> on 2016/06/30 16:53:11 UTC, 5 replies.
- How to spin up Kafka using docker and use for Spark Streaming Integration tests - posted by SRK <sw...@gmail.com> on 2016/06/30 18:19:12 UTC, 0 replies.
- RDD to DataFrame question with JsValue in the mix - posted by Do...@ODDO, od...@gmail.com on 2016/06/30 20:36:54 UTC, 0 replies.
- Logical Plan - posted by Darshan Singh <da...@gmail.com> on 2016/06/30 20:58:02 UTC, 4 replies.