user@spark.apache.org, 2016-02

You are viewing a plain text version of this content. The canonical link for it is here.

- Unpersist RDD in Graphx - posted by "Zhang, Jingyu" <ji...@news.com.au> on 2016/02/01 00:35:53 UTC, 1 replies.
- Re: Reading lzo+index with spark-csv (Splittable reads) - posted by Hyukjin Kwon <gu...@gmail.com> on 2016/02/01 02:11:41 UTC, 0 replies.
- confusing about start ipython notebook with spark between 1.3.x and 1.6.x - posted by charles li <ch...@gmail.com> on 2016/02/01 03:43:37 UTC, 0 replies.
- DAG visualization: no visualization information available with history server - posted by Raghava <m....@gmail.com> on 2016/02/01 05:38:37 UTC, 0 replies.
- how to introduce spark to your colleague if he has no background about *** spark related - posted by charles li <ch...@gmail.com> on 2016/02/01 07:31:28 UTC, 3 replies.
- code size of spark - posted by charles li <ch...@gmail.com> on 2016/02/01 08:39:36 UTC, 0 replies.
- Spark job does not perform well when some RDD in memory and some on Disk - posted by Prabhu Joseph <pr...@gmail.com> on 2016/02/01 09:32:44 UTC, 3 replies.
- Re: mapWithState: remove key - posted by Udo Fholl <ud...@gmail.com> on 2016/02/01 10:40:22 UTC, 0 replies.
- When char will be availble in Spark - posted by Dr Mich Talebzadeh <mi...@cloudtechnologypartners.co.uk> on 2016/02/01 10:42:29 UTC, 0 replies.
- Re: Repartition taking place for all previous windows even after checkpointing - posted by Abhishek Anand <ab...@gmail.com> on 2016/02/01 11:31:00 UTC, 0 replies.
- Using Java spring injection with spark - posted by HARSH TAKKAR <ta...@gmail.com> on 2016/02/01 11:58:48 UTC, 4 replies.
- Re: [MLlib] What is the best way to forecast the next month page visit? - posted by diplomatic Guru <di...@gmail.com> on 2016/02/01 12:29:04 UTC, 3 replies.
- Guidelines for writing SPARK packages - posted by Praveen Devarao <pr...@in.ibm.com> on 2016/02/01 13:03:18 UTC, 4 replies.
- Spark Executor retries infinitely - posted by Prabhu Joseph <pr...@gmail.com> on 2016/02/01 13:16:26 UTC, 2 replies.
- [ANNOUNCE] New SAMBA Package = Spark + AWS Lambda - posted by David Russell <th...@gmail.com> on 2016/02/01 13:23:43 UTC, 2 replies.
- Can't view executor logs in web UI on Windows - posted by Mark Pavey <ma...@thefilter.com> on 2016/02/01 14:13:49 UTC, 4 replies.
- AFTSurvivalRegression Prediction and QuantilProbabilities - posted by Christine Jula <Ch...@alexanderthamm.com> on 2016/02/01 15:09:30 UTC, 1 replies.
- java.nio.channels.ClosedChannelException in Spark Streaming KafKa Direct - posted by SRK <sw...@gmail.com> on 2016/02/01 16:59:23 UTC, 1 replies.
- Failed to 'collect_set' with dataset in spark 1.6 - posted by Alexandr Dzhagriev <dz...@gmail.com> on 2016/02/01 17:50:22 UTC, 7 replies.
- Re: Spark Caching Kafka Metadata - posted by Benjamin Han <be...@gmail.com> on 2016/02/01 18:09:45 UTC, 0 replies.
- Spark MLLlib Ideal way to convert categorical features into LabeledPoint RDD? - posted by unk1102 <um...@gmail.com> on 2016/02/01 18:21:38 UTC, 0 replies.
- How to build interactive dash boards with spark? - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/02/01 20:46:22 UTC, 0 replies.
- How to deal with same class mismatch? - posted by Daniel Valdivia <ho...@danielvaldivia.com> on 2016/02/01 21:07:15 UTC, 0 replies.
- Redirect Spark Logs to Kafka - posted by Ashish Soni <as...@gmail.com> on 2016/02/01 21:20:25 UTC, 1 replies.
- Failed job not throwing exception - posted by Nick Buroojy <ni...@civitaslearning.com> on 2016/02/01 21:46:11 UTC, 0 replies.
- Re: local class incompatible: stream classdesc serialVersionUID - posted by Holden Karau <ho...@pigscanfly.ca> on 2016/02/01 23:08:42 UTC, 2 replies.
- SPARK_WORKER_INSTANCES deprecated - posted by "Lin, Hao" <Ha...@finra.org> on 2016/02/01 23:19:08 UTC, 0 replies.
- Master failover and active jobs - posted by aant00 <aa...@yahoo.com> on 2016/02/01 23:19:55 UTC, 0 replies.
- Need help in spark-Scala program - posted by Vinti Maheshwari <vi...@gmail.com> on 2016/02/01 23:25:42 UTC, 1 replies.
- try to read multiple bz2 files in s3 - posted by "Lin, Hao" <Ha...@finra.org> on 2016/02/01 23:35:59 UTC, 3 replies.
- Re: SPARK_WORKER_INSTANCES deprecated - posted by Ted Yu <yu...@gmail.com> on 2016/02/01 23:44:44 UTC, 2 replies.
- Getting the size of a broadcast variable - posted by "apu mishra . rr" <ap...@gmail.com> on 2016/02/02 00:20:35 UTC, 2 replies.
- Using accumulator to push custom logs to driver - posted by Utkarsh Sengar <ut...@gmail.com> on 2016/02/02 00:24:25 UTC, 3 replies.
- Spark Streaming application designing question - posted by Vinti Maheshwari <vi...@gmail.com> on 2016/02/02 00:32:56 UTC, 0 replies.
- Re: Spark, Mesos, Docker and S3 - posted by Sathish Kumaran Vairavelu <vs...@gmail.com> on 2016/02/02 00:56:33 UTC, 0 replies.
- Re: How to control the number of files for dynamic partition in Spark SQL? - posted by Benyi Wang <be...@gmail.com> on 2016/02/02 01:21:37 UTC, 0 replies.
- Spark Standalone cluster job to connect Hbase is Stuck - posted by sudhir patil <sp...@gmail.com> on 2016/02/02 01:25:55 UTC, 3 replies.
- Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2 - posted by Jia Zou <ja...@gmail.com> on 2016/02/02 01:36:06 UTC, 1 replies.
- unsubscribe email - posted by Eduardo Costa Alfaia <e....@unibs.it> on 2016/02/02 01:38:38 UTC, 1 replies.
- RE: saveAsTextFile is not writing to local fs - posted by Mohammed Guller <mo...@glassbeam.com> on 2016/02/02 02:45:37 UTC, 2 replies.
- how to covert millisecond time to SQL timeStamp - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/02/02 02:51:05 UTC, 3 replies.
- Error w/ Invertable ReduceByKeyAndWindow - posted by Bryan Jeffrey <br...@gmail.com> on 2016/02/02 03:19:42 UTC, 1 replies.
- questions about progress bar status [stuck]? - posted by charles li <ch...@gmail.com> on 2016/02/02 03:23:42 UTC, 0 replies.
- What is the correct way to reset a Linux in Cluster? - posted by pengzhang130 <pz...@gmail.com> on 2016/02/02 04:10:13 UTC, 0 replies.
- Spark Streaming with Kafka - batch DStreams in memory - posted by p pathiyil <pa...@gmail.com> on 2016/02/02 05:11:33 UTC, 1 replies.
- Re: Explaination for info shown in UI - posted by Yogesh Mahajan <ym...@snappydata.io> on 2016/02/02 05:53:52 UTC, 0 replies.
- Is there some open source tools which implements draggable widget and make the app runing in a form of DAG ? - posted by zml张明磊 <mi...@Ctrip.com> on 2016/02/02 05:57:41 UTC, 0 replies.
- How to calculate weighted degrees in GraphX - posted by "Balachandar R.A." <ba...@gmail.com> on 2016/02/02 07:04:53 UTC, 0 replies.
- can we do column bind of 2 dataframes in spark R? similar to cbind in R? - posted by Devesh Raj Singh <ra...@gmail.com> on 2016/02/02 07:08:22 UTC, 1 replies.
- Re: Spark streaming and ThreadLocal - posted by N B <nb...@gmail.com> on 2016/02/02 08:27:42 UTC, 0 replies.
- Spark Streaming:Could not compute split - posted by aafri <87...@qq.com> on 2016/02/02 10:30:53 UTC, 0 replies.
- Master failover results in running job marked as "WAITING" - posted by Anthony Tang <aa...@yahoo.com.INVALID> on 2016/02/02 10:48:36 UTC, 2 replies.
- Spark saveAsHadoopFile stage fails with ExecutorLostfailure - posted by Prabhu Joseph <pr...@gmail.com> on 2016/02/02 15:21:56 UTC, 0 replies.
- MLLib embedded dependencies - posted by Valentin Popov <va...@gmail.com> on 2016/02/02 17:51:17 UTC, 0 replies.
- [MLLib] Is the order of the coefficients in a LogisticRegresionModel kept ? - posted by jmvllt <mo...@gmail.com> on 2016/02/02 18:21:44 UTC, 1 replies.
- optimal way to load parquet files with partition - posted by Wei Chen <we...@gmail.com> on 2016/02/02 19:07:42 UTC, 1 replies.
- Re: Spark Pattern and Anti-Pattern - posted by Lars Albertsson <la...@mapflat.com> on 2016/02/02 22:11:03 UTC, 0 replies.
- Spark 1.5.2 memory error - posted by Stefan Panayotov <sp...@msn.com> on 2016/02/02 22:22:00 UTC, 15 replies.
- Error trying to get DF for Hive table stored HBase - posted by Doug Balog <do...@dugos.com> on 2016/02/02 22:40:38 UTC, 0 replies.
- [Spark 1.5+] ReceiverTracker seems not to stop Kinesis receivers - posted by Roberto Coluccio <ro...@gmail.com> on 2016/02/02 23:40:39 UTC, 4 replies.
- Re: Error trying to get DF for Hive table stored HBase - posted by Ted Yu <yu...@gmail.com> on 2016/02/03 00:33:45 UTC, 0 replies.
- recommendProductsForUser for a subset of user - posted by Roberto Pagliari <ro...@asos.com> on 2016/02/03 00:58:02 UTC, 1 replies.
- Spark 1.5.2 - are new Project Tungsten optimizations available on RDD as well? - posted by Nirav Patel <np...@xactlycorp.com> on 2016/02/03 01:15:08 UTC, 0 replies.
- Re: Spark DataFrame Catalyst - Another Oracle like query optimizer? - posted by Nirav Patel <np...@xactlycorp.com> on 2016/02/03 01:17:14 UTC, 14 replies.
- Best way to process large number of (non-text) files in deeply nested folder hierarchy - posted by Boris Capitanu <bo...@hotmail.com> on 2016/02/03 01:53:20 UTC, 0 replies.
- Dynamic sql in Spark 1.5 - posted by Divya Gehlot <di...@gmail.com> on 2016/02/03 03:49:38 UTC, 3 replies.
- question on spark.streaming.kafka.maxRetries - posted by Chen Song <ch...@gmail.com> on 2016/02/03 04:10:17 UTC, 1 replies.
- Union of RDDs without the overhead of Union - posted by Jerry Lam <ch...@gmail.com> on 2016/02/03 05:05:07 UTC, 3 replies.
- Overriding toString and hashCode with Spark streaming - posted by N B <nb...@gmail.com> on 2016/02/03 06:18:40 UTC, 0 replies.
- make-distribution fails due to wrong order of modules - posted by Koert Kuipers <ko...@tresata.com> on 2016/02/03 06:35:04 UTC, 0 replies.
- how to calculate -- executor-memory,num-executors,total-executor-cores - posted by Divya Gehlot <di...@gmail.com> on 2016/02/03 07:13:22 UTC, 1 replies.
- DataFrame First method is resulting different results in each iteration - posted by satish chandra j <js...@gmail.com> on 2016/02/03 11:15:50 UTC, 4 replies.
- saveDF issue: dealing with missing values - posted by Devesh Raj Singh <ra...@gmail.com> on 2016/02/03 11:35:20 UTC, 1 replies.
- sparkR not able to create /append new columns - posted by Devesh Raj Singh <ra...@gmail.com> on 2016/02/03 12:06:31 UTC, 5 replies.
- Spark streaming archive results - posted by Udo Fholl <ud...@gmail.com> on 2016/02/03 13:01:10 UTC, 0 replies.
- Spark Streaming: Dealing with downstream services faults - posted by Udo Fholl <ud...@gmail.com> on 2016/02/03 13:03:51 UTC, 0 replies.
- spark-cassandra - posted by Madabhattula Rajesh Kumar <mr...@gmail.com> on 2016/02/03 13:37:21 UTC, 2 replies.
- spark metrics question - posted by Matt K <ma...@gmail.com> on 2016/02/03 14:32:57 UTC, 6 replies.
- Re: java.lang.ArrayIndexOutOfBoundsException when attempting broadcastjoin - posted by Alexandr Dzhagriev <dz...@gmail.com> on 2016/02/03 15:14:47 UTC, 0 replies.
- Spark 1.5 Streaming + Kafka 0.9.0 - posted by Pavel Sýkora <pa...@seznam.cz> on 2016/02/03 16:56:28 UTC, 1 replies.
- Spark 1.6.0 HiveContext NPE - posted by "Shipper, Jay [USA]" <Sh...@bah.com> on 2016/02/03 17:33:53 UTC, 2 replies.
- Re: [External] Re: Spark 1.6.0 HiveContext NPE - posted by "Shipper, Jay [USA]" <Sh...@bah.com> on 2016/02/03 18:06:14 UTC, 6 replies.
- Spark with SAS - posted by Sourav Mazumder <so...@gmail.com> on 2016/02/03 18:43:42 UTC, 2 replies.
- Spark Streaming - 1.6.0: mapWithState Kinesis huge memory usage - posted by Udo Fholl <ud...@gmail.com> on 2016/02/03 18:52:50 UTC, 5 replies.
- Connect to two different HDFS servers with different usernames - posted by Wayne Song <wa...@gmail.com> on 2016/02/03 20:11:09 UTC, 0 replies.
- Re: Product similarity with TF/IDF and Cosine similarity (DIMSUM) - posted by Karl Higley <km...@gmail.com> on 2016/02/03 20:28:28 UTC, 0 replies.
- Spark 1.5.2 Yarn Application Master - resiliencey - posted by Nirav Patel <np...@xactlycorp.com> on 2016/02/03 20:46:50 UTC, 6 replies.
- Low latency queries much slower in 1.6.0 - posted by Younes Naguib <Yo...@tritondigital.com> on 2016/02/03 21:17:00 UTC, 1 replies.
- Spark Streaming: My kafka receivers are not consuming in parallel - posted by Jorge Rodriguez <jo...@bloomreach.com> on 2016/02/03 21:44:08 UTC, 1 replies.
- Parquet StringType column readable as plain-text despite being Gzipped - posted by Sung Hwan Chung <co...@cs.stanford.edu> on 2016/02/03 22:16:54 UTC, 0 replies.
- Cassandra BEGIN BATCH - posted by FrankFlaherty <fr...@pega.com> on 2016/02/03 22:45:11 UTC, 2 replies.
- Nearest neighbors in Spark with Annoy - posted by "apu mishra . rr" <ap...@gmail.com> on 2016/02/03 23:04:48 UTC, 0 replies.
- SparkOscope: Enabling Spark Optimization through Cross-stack Monitoring and Visualization - posted by Yiannis Gkoufas <jo...@gmail.com> on 2016/02/04 01:44:44 UTC, 0 replies.
- How parquet file decide task number? - posted by Gavin Yue <yu...@gmail.com> on 2016/02/04 02:05:25 UTC, 1 replies.
- clear cache using spark sql cli - posted by "fightfate@163.com" <fi...@163.com> on 2016/02/04 04:16:43 UTC, 4 replies.
- Is there a any plan to develop SPARK with c++?? - posted by DaeJin Jung <ha...@gmail.com> on 2016/02/04 06:49:16 UTC, 1 replies.
- About cache table performance in spark sql - posted by "fightfate@163.com" <fi...@163.com> on 2016/02/04 06:55:04 UTC, 4 replies.
- Re: spark streaming web ui not showing the events - direct kafka api - posted by vimal dinakaran <vi...@gmail.com> on 2016/02/04 07:07:41 UTC, 1 replies.
- Need to user univariate summary stats - posted by Arunkumar Pillai <ar...@gmail.com> on 2016/02/04 10:22:33 UTC, 1 replies.
- add new column in the schema + Dataframe - posted by Divya Gehlot <di...@gmail.com> on 2016/02/04 10:28:57 UTC, 1 replies.
- [Spark 1.6] Univariate Stats using apache spark - posted by Arunkumar Pillai <ar...@gmail.com> on 2016/02/04 10:34:50 UTC, 0 replies.
- library dependencies to run spark local mode - posted by Valentin Popov <va...@gmail.com> on 2016/02/04 11:49:44 UTC, 3 replies.
- [Spark 1.6] Mismatch in kurtosis values - posted by Arunkumar Pillai <ar...@gmail.com> on 2016/02/04 12:21:51 UTC, 1 replies.
- problem in creating function in sparkR for dummy handling - posted by Devesh Raj Singh <ra...@gmail.com> on 2016/02/04 12:51:06 UTC, 0 replies.
- Question on RDD caching - posted by Vishnu Viswanath <vi...@gmail.com> on 2016/02/04 14:58:15 UTC, 0 replies.
- PairDStreamFunctions.mapWithState fails in case timeout is set without updating State[S] - posted by "Yuval.Itzchakov" <yu...@gmail.com> on 2016/02/04 15:56:53 UTC, 7 replies.
- Using jar bundled log4j.xml on worker nodes - posted by Matthias Niehoff <ma...@codecentric.de> on 2016/02/04 18:06:24 UTC, 2 replies.
- Spark Cassandra Atomic Inserts - posted by "Flaherty, Frank" <Fr...@pega.com> on 2016/02/04 18:45:44 UTC, 0 replies.
- Memory tuning in spark sql - posted by AR...@cognizant.com on 2016/02/04 18:48:29 UTC, 2 replies.
- Reading large set of files in Spark - posted by Akhilesh Pathodia <pa...@gmail.com> on 2016/02/04 19:58:50 UTC, 1 replies.
- sc.textFile the number of the workers to parallelize - posted by "Lin, Hao" <Ha...@finra.org> on 2016/02/04 21:04:39 UTC, 0 replies.
- Dataset Encoders for SparseVector - posted by "raj.kumar" <ra...@hooklogic.com> on 2016/02/04 21:22:52 UTC, 1 replies.
- Re: Broadcast join on multiple dataframes - posted by Srikanth <sr...@gmail.com> on 2016/02/04 21:54:40 UTC, 0 replies.
- Recommended storage solution for my setup (~5M items, 10KB pr.) - posted by habitats <ma...@habitats.no> on 2016/02/04 21:58:52 UTC, 3 replies.
- cause of RPC error? - posted by AlexG <sw...@gmail.com> on 2016/02/04 22:34:34 UTC, 1 replies.
- submit spark job with spcified file for driver - posted by alexeyy3 <al...@searshc.com> on 2016/02/04 23:17:43 UTC, 2 replies.
- Driver not able to restart the job automatically after the application of Streaming with Kafka Direct went down - posted by SRK <sw...@gmail.com> on 2016/02/05 02:30:14 UTC, 1 replies.
- Unit test with sqlContext - posted by Steve Annessa <st...@gmail.com> on 2016/02/05 02:36:56 UTC, 4 replies.
- kafkaDirectStream usage error - posted by Diwakar Dhanuskodi <di...@gmail.com> on 2016/02/05 02:58:28 UTC, 1 replies.
- Kafka directsream receiving rate - posted by Diwakar Dhanuskodi <di...@gmail.com> on 2016/02/05 03:03:19 UTC, 1 replies.
- Slowness in Kmeans calculating fastSquaredDistance - posted by Li Ming Tsai <ma...@ltsai.com> on 2016/02/05 03:56:02 UTC, 2 replies.
- Add Singapore meetup - posted by Li Ming Tsai <ma...@ltsai.com> on 2016/02/05 04:07:37 UTC, 0 replies.
- spark.storage.memoryFraction for shuffle-only jobs - posted by Ruslan Dautkhanov <da...@gmail.com> on 2016/02/05 04:14:59 UTC, 0 replies.
- rdd cache priority - posted by charles li <ch...@gmail.com> on 2016/02/05 04:15:09 UTC, 1 replies.
- SQL Statement on DataFrame - posted by Nishant Aggarwal <ni...@gmail.com> on 2016/02/05 04:28:03 UTC, 2 replies.
- Please help with external package using --packages option in spark-shell - posted by Jeff - Data Bean Australia <da...@gmail.com> on 2016/02/05 05:00:21 UTC, 2 replies.
- Re: sc.textFile the number of the workers to parallelize - posted by Takeshi Yamamuro <li...@gmail.com> on 2016/02/05 05:41:05 UTC, 1 replies.
- spark.executor.memory ? is used just for cache RDD or both cache RDD and the runtime of cores on worker? - posted by charles li <ch...@gmail.com> on 2016/02/05 06:26:26 UTC, 1 replies.
- different behavior while using createDataFrame and read.df in SparkR - posted by Devesh Raj Singh <ra...@gmail.com> on 2016/02/05 07:44:28 UTC, 3 replies.
- pass one dataframe column value to another dataframe filter expression + Spark 1.5 + scala - posted by Divya Gehlot <di...@gmail.com> on 2016/02/05 08:41:32 UTC, 2 replies.
- Too many open files, why changing ulimit not effecting? - posted by Mohamed Nadjib MAMI <ma...@iai.uni-bonn.de> on 2016/02/05 10:42:34 UTC, 3 replies.
- DenseMatrix update - posted by Zapper22 <ma...@gmail.com> on 2016/02/05 12:36:43 UTC, 0 replies.
- Hadoop credentials missing in some tasks? - posted by Gerard Maas <ge...@gmail.com> on 2016/02/05 12:58:26 UTC, 1 replies.
- pyspark - spark history server - posted by cs user <ac...@gmail.com> on 2016/02/05 15:08:21 UTC, 1 replies.
- Re: Kafka directsream receiving rate - posted by Cody Koeninger <co...@koeninger.org> on 2016/02/05 17:37:42 UTC, 8 replies.
- What is the best way to JOIN two 10TB csv files and three 100kb files on Spark? - posted by Rex X <dn...@gmail.com> on 2016/02/05 18:25:20 UTC, 1 replies.
- Help needed in deleting a message posted in Spark User List - posted by swetha kasireddy <sw...@gmail.com> on 2016/02/05 18:33:44 UTC, 3 replies.
- How to edit/delete a message posted in Apache Spark User List? - posted by SRK <sw...@gmail.com> on 2016/02/05 18:35:46 UTC, 1 replies.
- Spark process failing to receive data from the Kafka queue in yarn-client mode. - posted by Rachana Srivastava <Ra...@markmonitor.com> on 2016/02/05 18:38:18 UTC, 0 replies.
- Please Add Our Meetup to the Spark Meetup List - posted by Timothy Spann <ti...@airisdata.com> on 2016/02/05 20:22:39 UTC, 1 replies.
- Shuffle memory woes - posted by Corey Nolet <cj...@gmail.com> on 2016/02/05 22:07:47 UTC, 9 replies.
- Failed to remove broadcast 2 with removeFromMaster = true in Graphx - posted by "Zhang, Jingyu" <ji...@news.com.au> on 2016/02/05 22:50:18 UTC, 0 replies.
- Writing to jdbc database from SparkR (1.5.2) - posted by Andrew Holway <an...@otternetworks.de> on 2016/02/06 17:19:58 UTC, 1 replies.
- Spark Streaming with Druid? - posted by unk1102 <um...@gmail.com> on 2016/02/06 20:17:34 UTC, 3 replies.
- Re: Question on how to access tuple values in spark - posted by md...@gmail.com on 2016/02/07 00:45:24 UTC, 1 replies.
- Apache Spark data locality when integrating with Kafka - posted by fanooos <de...@gmail.com> on 2016/02/07 04:54:27 UTC, 6 replies.
- Imported CSV file content isn't identical to the original file - posted by SLiZn Liu <sl...@gmail.com> on 2016/02/07 08:44:00 UTC, 10 replies.
- Re: Bad Digest error while doing aws s3 put - posted by Dhimant <dh...@gmail.com> on 2016/02/07 08:57:04 UTC, 4 replies.
- Unexpected element type class - posted by Anoop Shiralige <an...@gmail.com> on 2016/02/07 14:14:49 UTC, 0 replies.
- 回复： Shuffle memory woes - posted by Sea <26...@qq.com> on 2016/02/07 15:28:15 UTC, 0 replies.
- Advise on using spark shell for Hive table sql queries - posted by Mich Talebzadeh <mi...@peridale.co.uk> on 2016/02/07 21:17:47 UTC, 0 replies.
- Handling Hive Table With large number of rows - posted by Meetu Maltiar <me...@gmail.com> on 2016/02/08 06:48:30 UTC, 2 replies.
- Extract all the values from describe - posted by Arunkumar Pillai <ar...@gmail.com> on 2016/02/08 11:24:24 UTC, 1 replies.
- How to see Cassandra List / Set / Map values from Spark Hive Thrift JDBC? - posted by Matthew Johnson <ma...@algomi.com> on 2016/02/08 11:25:13 UTC, 0 replies.
- [Spark 1.5.1] percentile in spark - posted by Arunkumar Pillai <ar...@gmail.com> on 2016/02/08 11:25:49 UTC, 0 replies.
- Re: how can i write map(x => x._1 ++ x._2) statement in python.?? - posted by "Yuval.Itzchakov" <yu...@gmail.com> on 2016/02/08 11:51:57 UTC, 0 replies.
- Futures timed out after [120 seconds] - posted by Andrew Milkowski <am...@gmail.com> on 2016/02/08 16:43:25 UTC, 0 replies.
- Access batch statistics in Spark Streaming - posted by Chen Song <ch...@gmail.com> on 2016/02/08 17:34:49 UTC, 1 replies.
- Dynamically Change Log Level Spark Streaming - posted by Ashish Soni <as...@gmail.com> on 2016/02/08 19:05:09 UTC, 0 replies.
- Spark LBFGS Error with ANN - posted by Hayri Volkan Agun <vo...@gmail.com> on 2016/02/08 20:02:02 UTC, 2 replies.
- Example of onEnvironmentUpdate Listener - posted by Ashish Soni <as...@gmail.com> on 2016/02/08 20:57:01 UTC, 0 replies.
- ErrorToken illegal character in a query having / @ $ . symbols - posted by Mohamed Nadjib MAMI <ma...@iai.uni-bonn.de> on 2016/02/08 22:08:45 UTC, 0 replies.
- Spark in Production - Use Cases - posted by Scott walent <sc...@gmail.com> on 2016/02/08 23:12:34 UTC, 0 replies.
- Optimal way to re-partition from a single partition - posted by Cesar Flores <ce...@gmail.com> on 2016/02/08 23:30:45 UTC, 6 replies.
- LogisticRegressionModel not able to load serialized model from S3 - posted by Utkarsh Sengar <ut...@gmail.com> on 2016/02/09 00:41:57 UTC, 1 replies.
- ALS rating caching - posted by Roberto Pagliari <ro...@asos.com> on 2016/02/09 00:48:03 UTC, 2 replies.
- [Spark Streaming] Spark Streaming dropping last lines - posted by Nipun Arora <ni...@gmail.com> on 2016/02/09 04:05:47 UTC, 2 replies.
- Long running Spark job on YARN throws "No AMRMToken" - posted by Prabhu Joseph <pr...@gmail.com> on 2016/02/09 05:34:57 UTC, 1 replies.
- Spark Job on YARN accessing Hbase Table - posted by Prabhu Joseph <pr...@gmail.com> on 2016/02/09 09:12:21 UTC, 3 replies.
- createDataFrame question - posted by jdkorigan <su...@korigan.com> on 2016/02/09 13:22:18 UTC, 3 replies.
- spark-cassandra-connector BulkOutputWriter - posted by Alexandr Dzhagriev <dz...@gmail.com> on 2016/02/09 15:52:06 UTC, 1 replies.
- [Spark Streaming] Joining Kafka and Cassandra DataFrames - posted by be...@chapter7.ch on 2016/02/09 15:58:03 UTC, 6 replies.
- Dataset joinWith condition - posted by Raghava Mutharaju <m....@gmail.com> on 2016/02/09 16:07:55 UTC, 4 replies.
- jssc.textFileStream(directory) how to ensure it read entire all incoming files - posted by unk1102 <um...@gmail.com> on 2016/02/09 17:12:44 UTC, 0 replies.
- HADOOP_HOME are not set when try to run spark application in yarn cluster mode - posted by Rachana Srivastava <Ra...@markmonitor.com> on 2016/02/09 18:23:10 UTC, 2 replies.
- Re: how to send JavaDStream RDD using foreachRDD using Java - posted by unk1102 <um...@gmail.com> on 2016/02/09 18:57:05 UTC, 0 replies.
- Appropriate Apache Users List Uses - posted by John Omernik <jo...@omernik.com> on 2016/02/09 20:36:27 UTC, 3 replies.
- Spark with .NET - posted by Arko Provo Mukherjee <ar...@gmail.com> on 2016/02/09 20:43:48 UTC, 8 replies.
- spark 1.6.0 connect to hive metastore - posted by Koert Kuipers <ko...@tresata.com> on 2016/02/09 20:58:46 UTC, 7 replies.
- Spark Increase in Processing Time - posted by Bryan Jeffrey <br...@gmail.com> on 2016/02/09 21:49:09 UTC, 3 replies.
- How to collect/take arbitrary number of records in the driver? - posted by SRK <sw...@gmail.com> on 2016/02/09 22:58:18 UTC, 2 replies.
- How to do a look up by id from files in hdfs inside a transformation/action ina RDD - posted by SRK <sw...@gmail.com> on 2016/02/09 23:01:51 UTC, 0 replies.
- Learning Fails with 4 Number of Layes at ANN Training with SGDOptimizer - posted by Hayri Volkan Agun <vo...@gmail.com> on 2016/02/09 23:26:06 UTC, 1 replies.
- spark-csv partitionBy - posted by Srikanth <sr...@gmail.com> on 2016/02/10 00:28:27 UTC, 0 replies.
- How to use a register temp table inside mapPartitions of an RDD - posted by SRK <sw...@gmail.com> on 2016/02/10 02:22:13 UTC, 1 replies.
- AM creation in yarn client mode - posted by praveen S <my...@gmail.com> on 2016/02/10 05:42:05 UTC, 7 replies.
- Re: AM creation in yarn-client mode - posted by praveen S <my...@gmail.com> on 2016/02/10 06:57:37 UTC, 2 replies.
- Turning on logging for internal Spark logs - posted by Li Ming Tsai <ma...@ltsai.com> on 2016/02/10 07:06:02 UTC, 0 replies.
- Pyspark - how to use UDFs with dataframe groupby - posted by Viktor ARDELEAN <vi...@gmail.com> on 2016/02/10 08:44:15 UTC, 1 replies.
- Pyspark - How to add new column to dataframe based on existing column value - posted by Viktor ARDELEAN <vi...@gmail.com> on 2016/02/10 10:34:54 UTC, 2 replies.
- Spark : Unable to connect to Oracle - posted by Divya Gehlot <di...@gmail.com> on 2016/02/10 10:37:03 UTC, 2 replies.
- Spark Streaming : Limiting number of receivers per executor - posted by ajay garg <aj...@mobileum.com> on 2016/02/10 11:34:59 UTC, 2 replies.
- Is there a way to save csv file fast ? - posted by Eli Super <el...@gmail.com> on 2016/02/10 11:56:14 UTC, 2 replies.
- add kafka streaming jars when initialising the sparkcontext in python - posted by David Kennedy <da...@gmail.com> on 2016/02/10 13:48:25 UTC, 0 replies.
- broadcast join in SparkSQL requires analyze table noscan - posted by Lan Jiang <lj...@gmail.com> on 2016/02/10 15:31:06 UTC, 2 replies.
- Introducing spark-sklearn, a scikit-learn integration package for Spark - posted by Tim Hunter <ti...@databricks.com> on 2016/02/10 18:20:24 UTC, 0 replies.
- Kafka + Spark 1.3 Integration - posted by Nipun Arora <ni...@gmail.com> on 2016/02/10 20:28:15 UTC, 3 replies.
- supporting adoc files in spark-packages.org - posted by Kiran Chitturi <ki...@lucidworks.com> on 2016/02/10 20:53:17 UTC, 0 replies.
- retrieving all the rows with collect() - posted by mi...@cloudtechnologypartners.co.uk on 2016/02/10 21:14:12 UTC, 7 replies.
- reading ORC format on Spark-SQL - posted by Philip Lee <ph...@gmail.com> on 2016/02/10 21:39:00 UTC, 0 replies.
- newbie how to access S3 cluster created using spark-ec2 - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/02/10 22:20:34 UTC, 0 replies.
- legal column names - posted by Richard Cobbe <ri...@oracle.com> on 2016/02/10 22:48:13 UTC, 0 replies.
- SparkListener - why is org.apache.spark.scheduler.JobFailed in scala private? - posted by Sumona Routh <su...@gmail.com> on 2016/02/10 23:51:29 UTC, 0 replies.
- Rest API for spark - posted by Tracy Li <ti...@gmail.com> on 2016/02/11 00:37:05 UTC, 2 replies.
- RDD distribution - posted by daze5112 <da...@ato.gov.au> on 2016/02/11 00:54:16 UTC, 1 replies.
- Spark Application Master on Yarn client mode - Virtual memory limit - posted by Nirav Patel <np...@xactlycorp.com> on 2016/02/11 01:16:35 UTC, 6 replies.
- saveToCassandra doesn't overwrite column - posted by Hudong Wang <ju...@hotmail.com> on 2016/02/11 01:35:52 UTC, 0 replies.
- Spark execuotr Memory profiling - posted by Nirav Patel <np...@xactlycorp.com> on 2016/02/11 02:09:39 UTC, 6 replies.
- [MLLIB] Best way to extract RandomForest decision splits - posted by jluan <ja...@gmail.com> on 2016/02/11 02:56:40 UTC, 3 replies.
- Computing hamming distance over large data set - posted by rokclimb15 <ro...@gmail.com> on 2016/02/11 04:29:07 UTC, 4 replies.
- Passing a dataframe to where clause + Spark SQL - posted by Divya Gehlot <di...@gmail.com> on 2016/02/11 05:23:58 UTC, 1 replies.
- Spark Certification - posted by naga sharathrayapati <sh...@gmail.com> on 2016/02/11 05:36:34 UTC, 5 replies.
- RDD uses another RDD in pyspark with SPARK-5063 issue - posted by vince plum <li...@gmail.com> on 2016/02/11 08:09:42 UTC, 0 replies.
- Is this Task Scheduler Error normal? - posted by SLiZn Liu <sl...@gmail.com> on 2016/02/11 08:23:03 UTC, 0 replies.
- Getting prediction values in spark mllib - posted by Chandan Verma <ch...@citiustech.com> on 2016/02/11 09:02:04 UTC, 3 replies.
- [OT] Apache Spark Jobs in Kochi, India - posted by Andrew Holway <an...@otternetworks.de> on 2016/02/11 09:31:49 UTC, 0 replies.
- spark shell ini file - posted by Mich Talebzadeh <mi...@cloudtechnologypartners.co.uk> on 2016/02/11 10:45:43 UTC, 1 replies.
- Inserting column to DataFrame - posted by Zsolt Tóth <to...@gmail.com> on 2016/02/11 11:12:34 UTC, 5 replies.
- Spark Streaming with Kafka: Dealing with 'slow' partitions - posted by p pathiyil <pa...@gmail.com> on 2016/02/11 14:59:56 UTC, 5 replies.
- Dataframes - posted by Gaurav Agarwal <ga...@gmail.com> on 2016/02/11 15:05:29 UTC, 4 replies.
- Scala types to StructType - posted by Fabian Böhnlein <fa...@gmail.com> on 2016/02/11 15:20:06 UTC, 6 replies.
- PySpark : couldn't pickle object of type class T - posted by Anoop Shiralige <an...@gmail.com> on 2016/02/11 15:38:46 UTC, 3 replies.
- How to parallel read files in a directory - posted by Junjie Qian <qi...@outlook.com> on 2016/02/11 18:33:24 UTC, 3 replies.
- spark thrift server transport protocol - posted by Sanjeev Verma <sa...@gmail.com> on 2016/02/11 18:34:16 UTC, 1 replies.
- cache DataFrame - posted by Gaurav Agarwal <ga...@gmail.com> on 2016/02/11 19:20:41 UTC, 2 replies.
- ApacheCon NA 2016 - Important Dates!!! - posted by Melissa Warnkin <mi...@yahoo.com.INVALID> on 2016/02/11 19:23:33 UTC, 0 replies.
- best practices? spark streaming writing output detecting disk full error - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/02/11 20:09:23 UTC, 3 replies.
- Spark workers disconnecting on 1.5.2 - posted by Andy Max <an...@gmail.com> on 2016/02/11 20:12:48 UTC, 3 replies.
- Re: Spark History Server pointing to S3 - posted by Vladimir Grigor <vl...@kiosked.com> on 2016/02/11 20:58:13 UTC, 0 replies.
- Stateful Operation on JavaPairDStream Help Needed !! - posted by Abhishek Anand <ab...@gmail.com> on 2016/02/11 21:40:38 UTC, 14 replies.
- Skip empty batches - spark streaming - posted by Sebastian Piu <se...@gmail.com> on 2016/02/11 22:03:29 UTC, 7 replies.
- newbie unable to write to S3 403 forbidden error - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/02/11 22:15:51 UTC, 5 replies.
- AmpLab Big Data Benchmark for Spark error on EC2 - posted by cheez <11...@seecs.edu.pk> on 2016/02/11 22:47:24 UTC, 0 replies.
- off-heap certain operations - posted by Ovidiu-Cristian MARCU <ov...@inria.fr> on 2016/02/11 22:51:56 UTC, 5 replies.
- Testing email please ignore - posted by Mich Talebzadeh <mi...@cloudtechnologypartners.co.uk> on 2016/02/11 23:00:14 UTC, 0 replies.
- Question on Spark architecture and DAG - posted by Mich Talebzadeh <mi...@peridale.co.uk> on 2016/02/11 23:30:26 UTC, 2 replies.
- Spark Summit San Francisco 2016 call for presentations (CFP) - posted by Reynold Xin <rx...@apache.org> on 2016/02/11 23:52:39 UTC, 0 replies.
- Re: Building Spark with a Custom Version of Hadoop: HDFS ClassNotFoundException - posted by Ted Yu <yu...@gmail.com> on 2016/02/12 02:41:54 UTC, 1 replies.
- spark streaming job keeps failing with JVM errors - posted by Sutanu Das <sd...@att.com> on 2016/02/12 05:20:00 UTC, 2 replies.
- SparkSQL parallelism - posted by Madabhattula Rajesh Kumar <mr...@gmail.com> on 2016/02/12 05:45:39 UTC, 1 replies.
- mllib:Survival Analysis : assertion failed: AFTAggregator loss sum is infinity. Error for unknown reason. - posted by Stuti Awasthi <st...@hcl.com> on 2016/02/12 08:03:16 UTC, 3 replies.
- 回复：off-heap certain operations - posted by Sea <26...@qq.com> on 2016/02/12 08:06:17 UTC, 0 replies.
- Spark runs only on Mesos v0.21? - posted by Petr Novak <os...@gmail.com> on 2016/02/12 09:31:16 UTC, 2 replies.
- Connection via JDBC to Oracle hangs after count call - posted by Mich Talebzadeh <mi...@peridale.co.uk> on 2016/02/12 11:45:25 UTC, 0 replies.
- Using SPARK packages in Spark Cluster - posted by Gourav Sengupta <go...@gmail.com> on 2016/02/12 13:22:35 UTC, 10 replies.
- Re: Convert Iterable to RDD - posted by "seb.arzt" <se...@gmail.com> on 2016/02/12 15:09:22 UTC, 1 replies.
- [SparkML] RandomForestModel save on disk. - posted by Eugene Morozov <ev...@gmail.com> on 2016/02/12 15:57:52 UTC, 1 replies.
- Python3 does not have Module 'UserString' - posted by Sisyphuss <zh...@gmail.com> on 2016/02/12 16:22:03 UTC, 4 replies.
- spark slate IP - posted by Christopher Bourez <ch...@gmail.com> on 2016/02/12 16:29:41 UTC, 0 replies.
- Spark Submit - posted by Ashish Soni <as...@gmail.com> on 2016/02/12 16:44:52 UTC, 4 replies.
- spark-shell throws JDBC error after load - posted by Mich Talebzadeh <mi...@peridale.co.uk> on 2016/02/12 17:16:24 UTC, 0 replies.
- spark-submit: remote protocol vs --py-files - posted by Jeff Henrikson <jh...@uw.edu> on 2016/02/12 18:23:30 UTC, 0 replies.
- Seperate Log4j.xml for Spark and Application JAR ( Application vs Spark ) - posted by Ashish Soni <as...@gmail.com> on 2016/02/12 18:32:52 UTC, 0 replies.
- SSE in s3 - posted by "Lin, Hao" <Ha...@finra.org> on 2016/02/12 18:53:52 UTC, 0 replies.
- coalesce and executor memory - posted by Christopher Brady <ch...@oracle.com> on 2016/02/12 19:13:14 UTC, 10 replies.
- Allowing parallelism in spark local mode - posted by yael aharon <ya...@gmail.com> on 2016/02/12 20:00:08 UTC, 2 replies.
- pyspark.DataFrame.dropDuplicates - posted by James Barney <ja...@gmail.com> on 2016/02/12 21:48:08 UTC, 0 replies.
- GroupedDataset flatMapGroups with sorting (aka secondary sort redux) - posted by Koert Kuipers <ko...@tresata.com> on 2016/02/12 21:56:15 UTC, 0 replies.
- _metada file throwing an "GC overhead limit exceeded" after a write - posted by Maurin Lenglart <ma...@cuberonlabs.com> on 2016/02/12 23:36:49 UTC, 0 replies.
- Spark with DF throws No suitable driver found for jdbc:oracle: after first call - posted by Mich Talebzadeh <mi...@peridale.co.uk> on 2016/02/12 23:44:23 UTC, 0 replies.
- Dataset takes more memory compared to RDD - posted by Raghava Mutharaju <m....@gmail.com> on 2016/02/13 00:22:15 UTC, 1 replies.
- How to write Array[Byte] as JPG file in Spark? - posted by Liangzhao Zeng <li...@gmail.com> on 2016/02/13 00:57:03 UTC, 0 replies.
- Sharing temporary table - posted by "max.tenerowicz" <ca...@gmail.com> on 2016/02/13 02:01:12 UTC, 0 replies.
- Dataset GroupedDataset.reduce - posted by Koert Kuipers <ko...@tresata.com> on 2016/02/13 02:01:29 UTC, 0 replies.
- Spark jobs run extremely slow on yarn cluster compared to standalone spark - posted by pdesai <pd...@cloudfabrix.com> on 2016/02/13 02:19:17 UTC, 1 replies.
- org.apache.spark.sql.AnalysisException: undefined function lit; - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/02/13 03:19:13 UTC, 2 replies.
- support vector machine does not classify properly? - posted by prem09 <pr...@hotmail.com> on 2016/02/13 03:47:19 UTC, 1 replies.
- new to Spark - trying to get a basic example to run - could use some help - posted by "Taylor, Ronald C" <Ro...@pnnl.gov> on 2016/02/13 04:14:02 UTC, 2 replies.
- Saving an Image file using binary Files - pyspark - posted by Sainath Palla <pa...@gmail.com> on 2016/02/13 10:54:00 UTC, 0 replies.
- jdbc driver used by spark fails folowing first stage - posted by Mich Talebzadeh <mi...@peridale.co.uk> on 2016/02/13 16:24:50 UTC, 0 replies.
- Unrecognized VM option 'MaxPermSize=512M' - posted by Milad khajavi <kh...@gmail.com> on 2016/02/13 16:34:17 UTC, 2 replies.
- RE: jdbc driver used by spark fails following first stage, solved it - posted by Mich Talebzadeh <mi...@peridale.co.uk> on 2016/02/13 17:20:20 UTC, 0 replies.
- Best practises of share Spark cluster over few applications - posted by Eugene Morozov <ev...@gmail.com> on 2016/02/13 17:40:37 UTC, 4 replies.
- Re: Write spark eventLog to both HDFS and local FileSystem - posted by nsalian <ns...@cloudera.com> on 2016/02/13 18:54:15 UTC, 0 replies.
- using udf to convert Oracle number column in Data Frame - posted by Mich Talebzadeh <mi...@peridale.co.uk> on 2016/02/13 18:55:36 UTC, 2 replies.
- Re: How to use scala.math.Ordering in java - posted by shcher <sh...@gmail.com> on 2016/02/13 21:20:55 UTC, 0 replies.
- GroupedDataset needs a mapValues - posted by Koert Kuipers <ko...@tresata.com> on 2016/02/13 22:35:58 UTC, 4 replies.
- How to store documents in hdfs and query them by id using Hive/Spark SQL - posted by SRK <sw...@gmail.com> on 2016/02/14 00:44:40 UTC, 0 replies.
- How to query a Hive table by Id from inside map partitions - posted by SRK <sw...@gmail.com> on 2016/02/14 01:14:39 UTC, 0 replies.
- Joining three tables with data frames - posted by Mich Talebzadeh <mi...@peridale.co.uk> on 2016/02/14 02:28:53 UTC, 2 replies.
- Re: Worker's BlockManager Folder not getting cleared - posted by Abhishek Anand <ab...@gmail.com> on 2016/02/14 08:04:53 UTC, 1 replies.
- Using explain plan to optimize sql query - posted by Mr rty ff <ya...@yahoo.com.INVALID> on 2016/02/14 09:53:44 UTC, 0 replies.
- Spark Error: Not enough space to cache partition rdd - posted by gustavolacerdas <gu...@gmail.com> on 2016/02/14 20:49:40 UTC, 1 replies.
- Trying to join a registered Hive table as temporary with two Oracle tables registered as temporary in Spark - posted by Mich Talebzadeh <mi...@peridale.co.uk> on 2016/02/14 21:22:10 UTC, 4 replies.
- Running synchronized JRI code - posted by Simon Hafner <re...@gmail.com> on 2016/02/14 22:09:28 UTC, 5 replies.
- Passing multiple jar files to spark-shell - posted by Mich Talebzadeh <mi...@peridale.co.uk> on 2016/02/15 00:35:27 UTC, 5 replies.
- IllegalStateException : When use --executor-cores option in YARN - posted by Divya Gehlot <di...@gmail.com> on 2016/02/15 03:36:15 UTC, 1 replies.
- Difference between spark-shell and spark-submit.Which one to use when ? - posted by Divya Gehlot <di...@gmail.com> on 2016/02/15 04:28:37 UTC, 1 replies.
- Best way to bring up Spark with Cassandra (and Elasticsearch) in production. - posted by Kevin Burton <bu...@spinn3r.com> on 2016/02/15 04:51:57 UTC, 1 replies.
- which master option to view current running job in Spark UI - posted by Divya Gehlot <di...@gmail.com> on 2016/02/15 05:40:01 UTC, 3 replies.
- How to query a hive table from inside a map in Spark - posted by SRK <sw...@gmail.com> on 2016/02/15 05:40:50 UTC, 1 replies.
- Unable to insert overwrite table with Spark 1.5.2 - posted by Ramanathan R <ra...@gmail.com> on 2016/02/15 06:29:14 UTC, 1 replies.
- How to join an RDD with a hive table? - posted by SRK <sw...@gmail.com> on 2016/02/15 06:53:29 UTC, 5 replies.
- Spark worker abruptly dying after 2 days - posted by Kartik Mathur <ka...@bluedata.com> on 2016/02/15 07:21:50 UTC, 4 replies.
- Need help :Does anybody has HDP cluster on EC2? - posted by Divya Gehlot <di...@gmail.com> on 2016/02/15 09:25:29 UTC, 5 replies.
- How to add kafka streaming jars when initialising the sparkcontext in python - posted by David Kennedy <da...@gmail.com> on 2016/02/15 10:35:07 UTC, 1 replies.
- New line lost in streaming output file - posted by Ashutosh Kumar <km...@gmail.com> on 2016/02/15 11:09:37 UTC, 6 replies.
- Spark DataFrameNaFunctions unrecognized - posted by satish chandra j <js...@gmail.com> on 2016/02/15 12:36:44 UTC, 4 replies.
- Single context Spark from Python and Scala - posted by Leonid Blokhin <lb...@provectus.com> on 2016/02/15 13:10:01 UTC, 1 replies.
- Re: temporary tables created by registerTempTable() - posted by Mich Talebzadeh <mi...@cloudtechnologypartners.co.uk> on 2016/02/15 15:54:53 UTC, 2 replies.
- More than one StateSpec in the same application - posted by Udo Fholl <ud...@gmail.com> on 2016/02/15 16:40:53 UTC, 0 replies.
- Memory problems and missing heartbeats - posted by JOAQUIN GUANTER GONZALBEZ <jo...@telefonica.com> on 2016/02/15 16:42:35 UTC, 6 replies.
- SparkListener onApplicationEnd processing an RDD throws exception because of stopped SparkContext - posted by Sumona Routh <su...@gmail.com> on 2016/02/15 16:59:35 UTC, 3 replies.
- Subscribe - posted by Jayesh Thakrar <j_...@yahoo.com.INVALID> on 2016/02/15 17:01:50 UTC, 0 replies.
- Check if column exists in Schema - posted by Sebastian Piu <se...@gmail.com> on 2016/02/15 20:17:46 UTC, 3 replies.
- [ANNOUNCE] Apache SystemML 0.9.0-incubating released - posted by Luciano Resende <lr...@apache.org> on 2016/02/15 20:34:15 UTC, 0 replies.
- Is predicate push-down supported by default in dataframes? - posted by SRK <sw...@gmail.com> on 2016/02/15 20:43:35 UTC, 0 replies.
- How to partition a dataframe based on an Id? - posted by SRK <sw...@gmail.com> on 2016/02/15 20:57:21 UTC, 0 replies.
- recommendations with duplicate ratings - posted by Roberto Pagliari <ro...@asos.com> on 2016/02/15 21:30:06 UTC, 4 replies.
- caching ratigs with ALS implicit - posted by Roberto Pagliari <ro...@asos.com> on 2016/02/15 22:21:44 UTC, 1 replies.
- Working out the optimizer matrix in Spark - posted by Mich Talebzadeh <mi...@cloudtechnologypartners.co.uk> on 2016/02/15 22:54:29 UTC, 0 replies.
- Out of Memory error caused by output object in mapPartitions - posted by nitinkak001 <ni...@gmail.com> on 2016/02/15 23:26:39 UTC, 0 replies.
- Migrating Transformers from Spark 1.3.1 to 1.5.0 - posted by Cesar Flores <ce...@gmail.com> on 2016/02/16 01:34:57 UTC, 1 replies.
- How to run Scala file examples in spark 1.5.2 - posted by Ashok Kumar <as...@yahoo.com.INVALID> on 2016/02/16 02:24:18 UTC, 2 replies.
- Re: Text search in Spark on compressed bz2 files - posted by Mich Talebzadeh <mi...@cloudtechnologypartners.co.uk> on 2016/02/16 02:27:13 UTC, 0 replies.
- Spark on Windows - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2016/02/16 03:18:42 UTC, 1 replies.
- which is better RDD or Dataframe? - posted by Divya Gehlot <di...@gmail.com> on 2016/02/16 03:43:35 UTC, 1 replies.
- IllegalArgumentException UnsatisfiedLinkError snappy-1.1.2 spark-shell error - posted by Paolo Villaflores <pb...@gmail.com> on 2016/02/16 04:09:13 UTC, 2 replies.
- SparkSQL/DataFrame - Is `JOIN USING` syntax null-safe? - posted by Zhong Wang <wa...@gmail.com> on 2016/02/16 04:25:21 UTC, 1 replies.
- Getting java.lang.IllegalArgumentException: requirement failed while calling Sparks MLLIB StreamingKMeans from java application - posted by Yogesh Vyas <in...@gmail.com> on 2016/02/16 05:16:34 UTC, 0 replies.
- Creating HiveContext in Spark-Shell fails - posted by Prabhu Joseph <pr...@gmail.com> on 2016/02/16 05:51:51 UTC, 3 replies.
- Side effects of using var inside a class object in a Rdd - posted by Hemalatha A <he...@googlemail.com> on 2016/02/16 05:53:19 UTC, 2 replies.
- Error when doing a SaveAstable on a Spark dataframe - posted by SRK <sw...@gmail.com> on 2016/02/16 07:46:30 UTC, 0 replies.
- Saving Kafka Offsets to Cassandra at begining of each batch in Spark Streaming - posted by Abhishek Anand <ab...@gmail.com> on 2016/02/16 08:15:55 UTC, 4 replies.
- Abnormally large deserialisation time for some tasks - posted by Abhishek Modi <ab...@gmail.com> on 2016/02/16 09:50:15 UTC, 0 replies.
- Stored proc with spark - posted by Gaurav Agarwal <ga...@gmail.com> on 2016/02/16 10:04:18 UTC, 6 replies.
- Unusually large deserialisation time - posted by Abhishek Modi <ab...@gmail.com> on 2016/02/16 10:12:43 UTC, 6 replies.
- reading spark dataframe in python - posted by Devesh Raj Singh <ra...@gmail.com> on 2016/02/16 10:59:54 UTC, 1 replies.
- Submit custom python packages from current project - posted by Mohannad Ali <ma...@gmail.com> on 2016/02/16 11:03:21 UTC, 3 replies.
- Scala from Jupyter - posted by AlexModestov <Al...@gmail.com> on 2016/02/16 12:19:24 UTC, 8 replies.
- How to debug spark-core with function call stack? - posted by DaeJin Jung <ha...@gmail.com> on 2016/02/16 13:57:31 UTC, 0 replies.
- Use case for RDD and Data Frame - posted by Ashok Kumar <as...@yahoo.com.INVALID> on 2016/02/16 17:05:44 UTC, 5 replies.
- In Spark Dataframes, does dropDuplicates retain the first row? - posted by tmoffwood <th...@growthintel.com> on 2016/02/16 17:33:39 UTC, 0 replies.
- Frustration over Spark and Jackson - posted by Martin Skøtt <ma...@z3n.dk> on 2016/02/16 18:08:08 UTC, 2 replies.
- How to use a custom partitioner in a dataframe in Spark - posted by SRK <sw...@gmail.com> on 2016/02/16 19:21:55 UTC, 6 replies.
- Fair Scheduler Pools with Kafka Streaming - posted by p pathiyil <pa...@gmail.com> on 2016/02/16 19:33:39 UTC, 1 replies.
- Spark SQL step with many tasks takes a long time to begin processing - posted by "Dukek, Dillon" <Di...@T-Mobile.com> on 2016/02/16 19:59:46 UTC, 3 replies.
- Spark null pointer exception and task failure - posted by Bijuna <bi...@gmail.com> on 2016/02/16 21:03:18 UTC, 1 replies.
- How to delete a record from parquet files using dataframes - posted by SRK <sw...@gmail.com> on 2016/02/16 22:11:24 UTC, 2 replies.
- spark examples Analytics ConnectedComponents - keep running, nothing in output - posted by Ovidiu-Cristian MARCU <ov...@inria.fr> on 2016/02/16 22:19:42 UTC, 0 replies.
- Optimize the performance of inserting data to Cassandra with Kafka and Spark Streaming - posted by Jerry <je...@gmail.com> on 2016/02/16 22:29:33 UTC, 2 replies.
- Lost executors failed job unable to execute spark examples Triangle Count (Analytics triangles) - posted by Ovidiu-Cristian MARCU <ov...@inria.fr> on 2016/02/16 22:50:53 UTC, 0 replies.
- streaming application redundant dag stage execution/performance/caching - posted by krishna ramachandran <ra...@s1776.com> on 2016/02/16 23:32:22 UTC, 0 replies.
- Spark Streaming with Kafka DirectStream - posted by Cyril Scetbon <cy...@free.fr> on 2016/02/17 02:18:21 UTC, 5 replies.
- How to update data saved as parquet in hdfs using Dataframes - posted by SRK <sw...@gmail.com> on 2016/02/17 03:45:46 UTC, 2 replies.
- Spark Streaming with Kafka Use Case - posted by Abhishek Anand <ab...@gmail.com> on 2016/02/17 07:57:27 UTC, 6 replies.
- cartesian with Dataset - posted by Alex Dzhagriev <dz...@gmail.com> on 2016/02/17 10:08:49 UTC, 1 replies.
- SparkOnHBase : Which version of Spark its available - posted by Divya Gehlot <di...@gmail.com> on 2016/02/17 10:44:55 UTC, 4 replies.
- Calender Obj to java.util.date conversion issue - posted by satish chandra j <js...@gmail.com> on 2016/02/17 15:04:07 UTC, 0 replies.
- listening to recursive folder structures in s3 using pyspark streaming (textFileStream) - posted by in4maniac <sa...@skimlinks.com> on 2016/02/17 16:20:38 UTC, 2 replies.
- Error when executing Spark application on YARN - posted by alvarobrandon <al...@gmail.com> on 2016/02/17 16:39:44 UTC, 4 replies.
- Why no computations run on workers/slaves in cluster mode? - posted by Junjie Qian <qi...@outlook.com> on 2016/02/17 17:20:11 UTC, 2 replies.
- data type transform when creating an RDD object - posted by "Lin, Hao" <Ha...@finra.org> on 2016/02/17 17:47:20 UTC, 1 replies.
- Getting out of memory error during coalesce - posted by Anubhav Agarwal <an...@gmail.com> on 2016/02/17 19:57:02 UTC, 0 replies.
- trouble using Aggregator with DataFrame - posted by Koert Kuipers <ko...@tresata.com> on 2016/02/17 20:22:36 UTC, 2 replies.
- Yarn client mode: Setting environment variables - posted by Lin Zhao <li...@exabeam.com> on 2016/02/17 20:31:54 UTC, 3 replies.
- Streaming with broadcast joins - posted by Srikanth <sr...@gmail.com> on 2016/02/17 22:13:07 UTC, 11 replies.
- Running multiple foreach loops - posted by Daniel Imberman <da...@gmail.com> on 2016/02/17 22:30:29 UTC, 4 replies.
- Problem mixing MESOS Cluster Mode and Docker task execution - posted by "g.eynard.bontemps@gmail.com" <g....@gmail.com> on 2016/02/17 23:00:46 UTC, 0 replies.
- Importing csv files into Hive ORC target table - posted by Mich Talebzadeh <mi...@cloudtechnologypartners.co.uk> on 2016/02/17 23:43:54 UTC, 0 replies.
- Re: Importing csv files into Hive ORC target table - posted by Alex Dzhagriev <dz...@gmail.com> on 2016/02/17 23:58:07 UTC, 2 replies.
- pyspark take function error while count() and collect() are working fine - posted by Msr Msr <ms...@gmail.com> on 2016/02/18 01:12:32 UTC, 0 replies.
- Memory issues on spark - posted by AR...@cognizant.com on 2016/02/18 02:02:15 UTC, 3 replies.
- Opaque error in Spark - Windows - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2016/02/18 02:43:13 UTC, 0 replies.
- adding a split and union to a streaming application cause big performance hit - posted by ramach1776 <ra...@s1776.com> on 2016/02/18 03:03:50 UTC, 3 replies.
- spark stages in parallel - posted by Shushant Arora <sh...@gmail.com> on 2016/02/18 08:49:26 UTC, 1 replies.
- How to train and predict in parallel via Spark MLlib? - posted by "Igor L." <ta...@gmail.com> on 2016/02/18 10:28:23 UTC, 3 replies.
- Reading CSV file using pyspark - posted by Devesh Raj Singh <ra...@gmail.com> on 2016/02/18 11:05:28 UTC, 2 replies.
- How do I stream in Parquet files using fileStream() and ParquetInputFormat - posted by roryofbyrne <ro...@gmail.com> on 2016/02/18 11:40:59 UTC, 0 replies.
- Re: Is stddev not a supported aggregation function in SparkSQL WindowSpec? - posted by rok <ro...@gmail.com> on 2016/02/18 12:03:56 UTC, 0 replies.
- explaination for parent.slideDuration in ReducedWindowedDStream - posted by Sachin Aggarwal <di...@gmail.com> on 2016/02/18 12:14:02 UTC, 0 replies.
- Is this likely to cause any problems? - posted by James Hammerton <ja...@gluru.co> on 2016/02/18 12:39:34 UTC, 13 replies.
- How do I stream in Parquet files using fileStream() and ParquetInputFormat? - posted by Rory Byrne <ro...@gmail.com> on 2016/02/18 13:05:54 UTC, 0 replies.
- SPARK-9559 - posted by Ashish Soni <as...@gmail.com> on 2016/02/18 16:13:32 UTC, 2 replies.
- equalTo isin not working as expected with a constructed column with DataFrames - posted by Mehdi Ben Haj Abbes <me...@gmail.com> on 2016/02/18 16:20:36 UTC, 2 replies.
- SPARK REST API on YARN - posted by alvarobrandon <al...@gmail.com> on 2016/02/18 16:56:08 UTC, 1 replies.
- Re: Is stddev not a supported aggregation function in SparkSQLWindowSpec? - posted by Mich Talebzadeh <mi...@cloudtechnologypartners.co.uk> on 2016/02/18 16:58:09 UTC, 0 replies.
- UDAF support for DataFrames in Spark 1.5.0? - posted by Richard Cobbe <ri...@oracle.com> on 2016/02/18 17:31:53 UTC, 3 replies.
- SparkConf does not work for spark.driver.memory - posted by wgtmac <us...@gmail.com> on 2016/02/18 19:26:22 UTC, 1 replies.
- Access to broadcasted variable - posted by jeff saremi <je...@hotmail.com> on 2016/02/18 20:44:07 UTC, 4 replies.
- Lazy executors - posted by Bemaze <b....@gmail.com> on 2016/02/18 21:57:22 UTC, 0 replies.
- subtractByKey increases RDD size in memory - any ideas? - posted by DaPsul <da...@gmx.de> on 2016/02/18 22:37:56 UTC, 2 replies.
- Hive REGEXP_REPLACE use or equivalent in Spark - posted by Mich Talebzadeh <mi...@peridale.co.uk> on 2016/02/18 23:09:51 UTC, 5 replies.
- Spark History Server NOT showing Jobs with Hortonworks - posted by Sutanu Das <sd...@att.com> on 2016/02/18 23:22:08 UTC, 5 replies.
- JDBC based access to RDD - posted by Shyam Sarkar <ss...@gmail.com> on 2016/02/18 23:36:24 UTC, 3 replies.
- spark 1.6 new memory management - some issues with tasks not using all executors - posted by Koert Kuipers <ko...@tresata.com> on 2016/02/19 00:51:24 UTC, 6 replies.
- cannot coerce class "data.frame" to a DataFrame - with spark R - posted by roni <ro...@gmail.com> on 2016/02/19 01:54:53 UTC, 2 replies.
- StreamingKMeans does not update cluster centroid locations - posted by ramach1776 <ra...@s1776.com> on 2016/02/19 02:59:42 UTC, 6 replies.
- Spark JDBC connection - data writing success or failure cases - posted by Divya Gehlot <di...@gmail.com> on 2016/02/19 03:35:42 UTC, 6 replies.
- Using sbt assembly - posted by Arko Provo Mukherjee <ar...@gmail.com> on 2016/02/19 03:50:35 UTC, 1 replies.
- Logistic Regression using ML Pipeline - posted by Arunkumar Pillai <ar...@gmail.com> on 2016/02/19 06:27:01 UTC, 1 replies.
- Concurreny does not improve for Spark Jobs with Same Spark Context - posted by Prabhu Joseph <pr...@gmail.com> on 2016/02/19 06:51:35 UTC, 3 replies.
- How to get the code for class in spark - posted by Ashok Kumar <as...@yahoo.com.INVALID> on 2016/02/19 10:43:31 UTC, 3 replies.
- Read files dynamically having different schema under one parent directory + scala + Spakr 1.5,2 - posted by Divya Gehlot <di...@gmail.com> on 2016/02/19 11:14:34 UTC, 3 replies.
- Meetup in Rome - posted by Domenico Pontari <do...@gmail.com> on 2016/02/19 11:37:48 UTC, 1 replies.
- Re: Accessing Web UI - posted by vasbhat <va...@gmail.com> on 2016/02/19 12:18:51 UTC, 9 replies.
- an error when I read data from parquet - posted by AlexModestov <Al...@gmail.com> on 2016/02/19 13:59:40 UTC, 1 replies.
- Adding vertex to a graph in graphx is taking more time in subsequent addition - posted by Udbhav Agarwal <ud...@syncoms.com> on 2016/02/19 14:45:12 UTC, 0 replies.
- Spark stream job is take up /TMP with 100% - posted by Sutanu Das <sd...@att.com> on 2016/02/19 16:15:50 UTC, 1 replies.
- install databricks csv package for spark - posted by Ashok Kumar <as...@yahoo.com.INVALID> on 2016/02/19 16:26:19 UTC, 2 replies.
- Spark Random Forest Memory issues - posted by Ewan Higgs <ew...@ugent.be> on 2016/02/19 17:26:22 UTC, 0 replies.
- Spark Job Hanging on Join - posted by Tamara Mendt <tm...@hellofresh.com> on 2016/02/19 18:31:05 UTC, 10 replies.
- Communication between two spark streaming Job - posted by Ashish Soni <as...@gmail.com> on 2016/02/19 20:48:31 UTC, 2 replies.
- Submitting Jobs Programmatically - posted by Arko Provo Mukherjee <ar...@gmail.com> on 2016/02/20 02:56:34 UTC, 7 replies.
- Checking for null values when mapping - posted by Mich Talebzadeh <mi...@peridale.co.uk> on 2016/02/20 09:24:01 UTC, 10 replies.
- spark.driver.maxResultSize doesn't work in conf-file - posted by AlexModestov <Al...@gmail.com> on 2016/02/20 15:40:56 UTC, 1 replies.
- Spark Streaming: Is it possible to schedule multiple active batches? - posted by Jorge Rodriguez <jo...@bloomreach.com> on 2016/02/20 20:24:35 UTC, 1 replies.
- Fair scheduler pool details - posted by Eugene Morozov <ev...@gmail.com> on 2016/02/21 01:14:08 UTC, 1 replies.
- Constantly increasing Spark streaming heap memory - posted by Walid LEZZAR <wa...@gmail.com> on 2016/02/21 01:37:01 UTC, 1 replies.
- how to set database in DataFrame.saveAsTable? - posted by Glen <cn...@gmail.com> on 2016/02/21 02:55:30 UTC, 5 replies.
- Element appear in both 2 splits of RDD after using randomSplit - posted by tuan3w <ne...@gmail.com> on 2016/02/21 04:01:10 UTC, 2 replies.
- Behind the scene of RDD to DataFrame - posted by Weiwei Zhang <wz...@dons.usfca.edu> on 2016/02/21 07:18:43 UTC, 2 replies.
- filter by dict() key in pySpark - posted by Franc Carter <fr...@gmail.com> on 2016/02/21 12:41:10 UTC, 1 replies.
- Fwd: Evaluating spark streaming use case - posted by Jatin Kumar <jk...@rocketfuelinc.com.INVALID> on 2016/02/21 12:54:11 UTC, 4 replies.
- spark-xml can't recognize schema - posted by Prathamesh Dharangutte <pr...@gmail.com> on 2016/02/21 14:19:04 UTC, 5 replies.
- RDD[org.apache.spark.sql.Row] filter ERROR - posted by Tenghuan He <te...@gmail.com> on 2016/02/21 15:42:18 UTC, 3 replies.
- Stream group by - posted by Vinti Maheshwari <vi...@gmail.com> on 2016/02/21 18:05:10 UTC, 7 replies.
- Specify number of executors in standalone cluster mode - posted by Saiph Kappa <sa...@gmail.com> on 2016/02/21 18:31:03 UTC, 1 replies.
- Spark SQL is not returning records for hive bucketed tables on HDP - posted by "@Sanjiv Singh" <sa...@gmail.com> on 2016/02/22 04:27:20 UTC, 12 replies.
- Error :Type mismatch error when passing hdfs file path to spark-csv load method - posted by Divya Gehlot <di...@gmail.com> on 2016/02/22 05:45:28 UTC, 1 replies.
- [Please Help] Log redirection on EMR - posted by HARSH TAKKAR <ta...@gmail.com> on 2016/02/22 07:44:30 UTC, 2 replies.
- How to start spark streaming application with recent past timestamp for replay of old batches? - posted by ashokkumar rajendran <as...@gmail.com> on 2016/02/22 07:48:52 UTC, 1 replies.
- [Example] : read custom schema from file - posted by Divya Gehlot <di...@gmail.com> on 2016/02/22 08:40:48 UTC, 2 replies.
- Sample project on Image Processing - posted by "Mishra, Abhishek" <Ab...@xerox.com> on 2016/02/22 09:23:41 UTC, 4 replies.
- Loading file into executor classpath - posted by Amjad ALSHABANI <as...@gmail.com> on 2016/02/22 09:32:04 UTC, 0 replies.
- Kafka streaming receiver approach - new topic not read from beginning - posted by Paul Leclercq <pa...@tabmo.io> on 2016/02/22 10:52:09 UTC, 3 replies.
- [Cassandra-Connector] No Such Method Error despite correct versions - posted by Jan Algermissen <al...@icloud.com> on 2016/02/22 12:13:39 UTC, 1 replies.
- a new FileFormat 5x~100x faster than parquet - posted by 开心延年 <mu...@qq.com> on 2016/02/22 12:14:09 UTC, 0 replies.
- 回复：a new FileFormat 5x~100x faster than parquet - posted by 平平 <xu...@qq.com> on 2016/02/22 12:46:20 UTC, 1 replies.
- Re: a new FileFormat 5x~100x faster than parquet - posted by Akhil Das <ak...@sigmoidanalytics.com> on 2016/02/22 13:12:46 UTC, 0 replies.
- 回复： a new FileFormat 5x~100x faster than parquet - posted by 开心延年 <mu...@qq.com> on 2016/02/22 13:27:34 UTC, 2 replies.
- How to add a typesafe config file which is located on HDFS to spark-submit (cluster-mode)? - posted by Jobs <jo...@gmail.com> on 2016/02/22 14:22:31 UTC, 1 replies.
- 回复：回复： a new FileFormat 5x~100x faster than parquet - posted by 开心延年 <mu...@qq.com> on 2016/02/22 15:03:36 UTC, 1 replies.
- java.io.IOException: java.lang.reflect.InvocationTargetException on new spark machines - posted by Abhishek Anand <ab...@gmail.com> on 2016/02/22 15:12:23 UTC, 4 replies.
- an OOM while persist as DISK_ONLY - posted by Alex Dzhagriev <dz...@gmail.com> on 2016/02/22 15:12:36 UTC, 0 replies.
- Spark Streaming not reading input coming from the other ip - posted by Vinti Maheshwari <vi...@gmail.com> on 2016/02/22 15:38:22 UTC, 3 replies.
- Option[Long] parameter in case class parsed from JSON DataFrame failing when key not present in JSON - posted by Anthony Brew <at...@gmail.com> on 2016/02/22 15:42:48 UTC, 3 replies.
- Spark Cache Eviction - posted by Pietro Gentile <pi...@gmail.com> on 2016/02/22 15:43:21 UTC, 1 replies.
- Can we load csv partitioned data into one DF? - posted by Sa...@wellsfargo.com on 2016/02/22 16:25:01 UTC, 2 replies.
- Re: Can we load csv partitioned data into oneDF? - posted by Mich Talebzadeh <mi...@cloudtechnologypartners.co.uk> on 2016/02/22 16:45:30 UTC, 0 replies.
- DataFrame and char encoding - posted by jdkorigan <su...@korigan.com> on 2016/02/22 18:01:16 UTC, 0 replies.
- Re: Does Spark satisfy my requirements? - posted by Chitturi Padma <le...@gmail.com> on 2016/02/22 19:10:32 UTC, 0 replies.
- map operation clears custom partitioner - posted by Brian London <br...@gmail.com> on 2016/02/22 19:21:52 UTC, 2 replies.
- Read from kafka after application is restarted - posted by vaibhavrtk1 <le...@gmail.com> on 2016/02/22 20:06:00 UTC, 6 replies.
- Re: Left/Right Outer join on multiple Columns - posted by Abhisheks <sm...@gmail.com> on 2016/02/22 21:41:01 UTC, 1 replies.
- Newbie questions regarding log processing - posted by Philippe de Rochambeau <ph...@free.fr> on 2016/02/22 22:13:30 UTC, 4 replies.
- Serializing collections in Datasets - posted by Daniel Siegmann <da...@teamaol.com> on 2016/02/22 22:51:30 UTC, 3 replies.
- DirectFileOutputCommiter - posted by "igor.berman" <ig...@gmail.com> on 2016/02/22 23:18:39 UTC, 14 replies.
- Streaming mapWithState API has NullPointerException - posted by Aris <ar...@gmail.com> on 2016/02/22 23:29:34 UTC, 3 replies.
- Using functional programming rather than SQL - posted by Mich Talebzadeh <mi...@cloudtechnologypartners.co.uk> on 2016/02/23 00:16:45 UTC, 18 replies.
- Variable performance in Spark threads - posted by "alberto.scolari" <al...@polimi.it> on 2016/02/23 00:39:29 UTC, 0 replies.
- SparkMaster IP - posted by Arko Provo Mukherjee <ar...@gmail.com> on 2016/02/23 02:09:31 UTC, 2 replies.
- Force Partitioner to use entire entry of PairRDD as key - posted by jluan <ja...@gmail.com> on 2016/02/23 02:15:53 UTC, 4 replies.
- Checkpointing with Kafka streaming - posted by p pathiyil <pa...@gmail.com> on 2016/02/23 03:26:56 UTC, 0 replies.
- Spark UI documentaton needed - posted by Ajay Gupta <gu...@gmail.com> on 2016/02/23 03:54:42 UTC, 1 replies.
- [Example] : Save dataframes with different schema + Spark 1.5.2 and Dataframe + Spark-CSV package - posted by Divya Gehlot <di...@gmail.com> on 2016/02/23 04:34:55 UTC, 0 replies.
- spark 1.6 Not able to start spark - posted by Arunkumar Pillai <ar...@gmail.com> on 2016/02/23 06:38:54 UTC, 4 replies.
- Spark Streaming - graceful shutdown when stream has no more data - posted by Femi Anthony <fe...@gmail.com> on 2016/02/23 09:25:17 UTC, 6 replies.
- Dataset sorting - posted by Oliver Beattie <ol...@obeattie.com> on 2016/02/23 10:01:36 UTC, 0 replies.
- [Proposal] Enabling time series analysis on spark metrics - posted by Karan Kumar <ka...@gmail.com> on 2016/02/23 10:29:55 UTC, 0 replies.
- PySpark Pickle reading does not find module - posted by Fabian Böhnlein <fa...@gmail.com> on 2016/02/23 10:45:14 UTC, 0 replies.
- Percentile calculation in spark 1.6 - posted by Arunkumar Pillai <ar...@gmail.com> on 2016/02/23 11:08:01 UTC, 2 replies.
- pandas dataframe to spark csv - posted by Devesh Raj Singh <ra...@gmail.com> on 2016/02/23 13:03:52 UTC, 2 replies.
- Reindexing in graphx - posted by Udbhav Agarwal <ud...@syncoms.com> on 2016/02/23 13:18:56 UTC, 6 replies.
- Query Kafka Partitions from Spark SQL - posted by Abhishek Anand <ab...@gmail.com> on 2016/02/23 13:52:28 UTC, 0 replies.
- Use maxmind geoip lib to process ip on Spark/Spark Streaming - posted by Zhun Shen <sh...@gmail.com> on 2016/02/23 14:28:18 UTC, 1 replies.
- Re: Use maxmind geoip lib to process ip on Spark/Spark Streaming - posted by Romain Sagean <ro...@hupi.fr> on 2016/02/23 15:07:47 UTC, 2 replies.
- reasonable number of executors - posted by Alex Dzhagriev <dz...@gmail.com> on 2016/02/23 15:49:00 UTC, 3 replies.
- Fast way to parse JSON in Spark - posted by Jerry <je...@gmail.com> on 2016/02/23 18:02:45 UTC, 0 replies.
- Calculation of histogram bins and frequency in Apache spark 1.6 - posted by Arunkumar Pillai <ar...@gmail.com> on 2016/02/23 18:13:57 UTC, 2 replies.
- Count job stalling at shuffle stage on 3.4TB input (but only 5.3GB shuffle write) - posted by James Hammerton <ja...@gluru.co> on 2016/02/23 19:22:32 UTC, 0 replies.
- How to get progress information of an RDD operation - posted by "Wang, Ningjun (LNG-NPV)" <ni...@lexisnexis.com> on 2016/02/23 19:53:11 UTC, 3 replies.
- value from groubBy paired rdd - posted by "Mishra, Abhishek" <Ab...@xerox.com> on 2016/02/23 20:26:26 UTC, 2 replies.
- Spark standalone peer2peer network - posted by tdelacour <td...@seas.upenn.edu> on 2016/02/23 20:28:27 UTC, 3 replies.
- Association with remote system [akka.tcp://. . .] has failed - posted by Jeff Henrikson <jh...@uw.edu> on 2016/02/23 22:10:24 UTC, 1 replies.
- Network Spark Streaming from multiple remote hosts - posted by Vinti Maheshwari <vi...@gmail.com> on 2016/02/23 22:13:20 UTC, 1 replies.
- Apache Arrow + Spark examples? - posted by Robert Towne <Ro...@WebTrends.com> on 2016/02/23 22:21:03 UTC, 2 replies.
- Spark 1.5.2, DataFrame broadcast join, OOM - posted by Yong Zhang <ja...@hotmail.com> on 2016/02/23 23:44:49 UTC, 0 replies.
- Re: Error decompressing .gz source data files - posted by rheras <rh...@gmail.com> on 2016/02/24 00:36:37 UTC, 0 replies.
- Performing multiple aggregations over the same data - posted by Daniel Imberman <da...@gmail.com> on 2016/02/24 01:49:37 UTC, 2 replies.
- How to join multiple tables and use subqueries in Spark SQL using sqlContext? - posted by SRK <sw...@gmail.com> on 2016/02/24 02:01:28 UTC, 2 replies.
- spark-1.6.0-bin-hadoop2.6/ec2/spark-ec2 uses old version of hadoop - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/02/24 02:22:46 UTC, 0 replies.
- metrics not reported by spark-cassandra-connector - posted by Sa Xiao <sa...@gmail.com> on 2016/02/24 02:24:10 UTC, 2 replies.
- streaming spark is writing results to S3 a good idea? - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/02/24 02:27:38 UTC, 2 replies.
- About Tensor Factorization in Spark - posted by Li Jiajia <ji...@gatech.edu> on 2016/02/24 04:50:29 UTC, 6 replies.
- how to interview spark developers - posted by charles li <ch...@gmail.com> on 2016/02/24 07:07:32 UTC, 1 replies.
- [Vote] : Spark-csv 1.3 + Spark 1.5.2 - Error parsing null values except String data type - posted by Divya Gehlot <di...@gmail.com> on 2016/02/24 07:36:22 UTC, 0 replies.
- spark.local.dir configuration - posted by Jung <jb...@naver.com> on 2016/02/24 09:13:57 UTC, 1 replies.
- Kafka partition increased while Spark Streaming is running - posted by 陈宇航 <yu...@foxmail.com> on 2016/02/24 09:15:40 UTC, 1 replies.
- Execution plan in spark - posted by Ashok Kumar <as...@yahoo.com.INVALID> on 2016/02/24 10:16:15 UTC, 4 replies.
- How to achieve co-location of task and source data - posted by okoeth <ok...@de.ibm.com> on 2016/02/24 10:41:17 UTC, 0 replies.
- No event log in /tmp/spark-events - posted by PatrickYu <ha...@gmail.com> on 2016/02/24 11:28:12 UTC, 2 replies.
- [Query] : How to read null values in Spark 1.5.2 - posted by Divya Gehlot <di...@gmail.com> on 2016/02/24 12:04:10 UTC, 0 replies.
- LDA topic Modeling spark + python - posted by "Mishra, Abhishek" <Ab...@xerox.com> on 2016/02/24 12:42:16 UTC, 3 replies.
- Re: Using Spark functional programming rather than SQL, Spark on Hive tables - posted by Mich Talebzadeh <mi...@cloudtechnologypartners.co.uk> on 2016/02/24 17:20:22 UTC, 3 replies.
- Implementing random walk in spark - posted by naveenkumarmarri <na...@gmail.com> on 2016/02/24 17:37:16 UTC, 2 replies.
- Spark and KafkaUtils - posted by Vinti Maheshwari <vi...@gmail.com> on 2016/02/24 18:14:30 UTC, 8 replies.
- Re: rdd.collect.foreach() vs rdd.collect.map() - posted by Chitturi Padma <le...@gmail.com> on 2016/02/24 18:47:51 UTC, 1 replies.
- Re: Restricting number of cores not resulting in reduction in parallelism - posted by Chitturi Padma <le...@gmail.com> on 2016/02/24 19:00:36 UTC, 3 replies.
- Spark-avro issue in 1.5.2 - posted by Ro...@thomsonreuters.com on 2016/02/24 21:08:12 UTC, 3 replies.
- Executor metrics - posted by Sudo User <su...@gmail.com> on 2016/02/24 21:50:05 UTC, 0 replies.
- coalesce executor memory explosion - posted by Christopher Brady <ch...@oracle.com> on 2016/02/24 22:31:25 UTC, 0 replies.
- Spark Summit (San Francisco, June 6-8) call for presentation due in less than week - posted by Reynold Xin <rx...@apache.org> on 2016/02/24 22:50:18 UTC, 0 replies.
- How could I do this algorithm in Spark? - posted by Guillermo Ortiz <ko...@gmail.com> on 2016/02/24 23:26:16 UTC, 10 replies.
- Filter on a column having multiple values - posted by Ashok Kumar <as...@yahoo.com.INVALID> on 2016/02/24 23:40:26 UTC, 3 replies.
- Error reading a CSV - posted by skunkwerk <sk...@gmail.com> on 2016/02/24 23:42:10 UTC, 2 replies.
- How to Exploding a Map[String,Int] column in a DataFrame (Scala) - posted by Anthony Brew <at...@gmail.com> on 2016/02/25 00:06:10 UTC, 3 replies.
- Re: Spark + Sentry + Kerberos don't add up? - posted by Ruslan Dautkhanov <da...@gmail.com> on 2016/02/25 00:45:42 UTC, 0 replies.
- chang hadoop version when import spark - posted by YouPeng Yang <yy...@gmail.com> on 2016/02/25 03:00:22 UTC, 0 replies.
- How does Spark streaming's Kafka direct stream survive from worker node failure? - posted by Yuhang Chen <yu...@gmail.com> on 2016/02/25 04:05:36 UTC, 2 replies.
- A question about Spark URL Usage: hostname vs IP address - posted by Yu Song <ap...@gmail.com> on 2016/02/25 04:06:52 UTC, 0 replies.
- Error:java.lang.RuntimeException: java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE - posted by xiazhuchang <hk...@163.com> on 2016/02/25 04:25:55 UTC, 1 replies.
- Re: What is the point of alpha value in Collaborative Filtering in MLlib ? - posted by Hiroyuki Yamada <mo...@gmail.com> on 2016/02/25 07:33:43 UTC, 2 replies.
- Number partitions after a join - posted by Guillermo Ortiz <ko...@gmail.com> on 2016/02/25 11:42:25 UTC, 4 replies.
- which is a more appropriate form of ratings ? - posted by Hiroyuki Yamada <mo...@gmail.com> on 2016/02/25 12:20:29 UTC, 3 replies.
- select * from mytable where column1 in (select max(column1) from mytable) - posted by Ashok Kumar <as...@yahoo.com.INVALID> on 2016/02/25 12:23:38 UTC, 3 replies.
- Running executors missing in sparkUI - posted by Jan Štěrba <in...@jansterba.com> on 2016/02/25 13:28:05 UTC, 2 replies.
- Multiple user operations in spark. - posted by Udbhav Agarwal <ud...@syncoms.com> on 2016/02/25 14:49:25 UTC, 1 replies.
- Spark SQL partitioned tables - check for partition - posted by Deenar Toraskar <de...@gmail.com> on 2016/02/25 16:24:48 UTC, 3 replies.
- d.filter("id in max(id)") - posted by Ashok Kumar <as...@yahoo.com.INVALID> on 2016/02/25 17:24:38 UTC, 1 replies.
- Spark Streaming - processing/transforming DStreams using a custom Receiver - posted by Dominik Safaric <do...@gmail.com> on 2016/02/25 17:27:01 UTC, 1 replies.
- Spark 1.6.0 running jobs in yarn shows negative no of tasks in executor - posted by unk1102 <um...@gmail.com> on 2016/02/25 17:54:03 UTC, 2 replies.
- Access fields by name/index from Avro data read from Kafka through Spark Streaming - posted by Mohammad Tariq <do...@gmail.com> on 2016/02/25 20:06:56 UTC, 3 replies.
- Bug in DiskBlockManager subDirs logic? - posted by Zee Chen <ze...@gmail.com> on 2016/02/25 22:00:36 UTC, 2 replies.
- PLease help: installation of spark 1.6.0 on ubuntu fails - posted by Marco Mistroni <mm...@gmail.com> on 2016/02/25 22:54:25 UTC, 1 replies.
- Spark SQL support for sub-queries - posted by Mich Talebzadeh <mi...@cloudtechnologypartners.co.uk> on 2016/02/25 23:46:04 UTC, 13 replies.
- How to overwrite data dynamically to specific partitions in Spark SQL - posted by SRK <sw...@gmail.com> on 2016/02/26 01:42:17 UTC, 0 replies.
- ALS trainImplicit performance - posted by Roberto Pagliari <ro...@asos.com> on 2016/02/26 02:23:22 UTC, 1 replies.
- Saving and Loading Dataframes - posted by "raj.kumar" <ra...@hooklogic.com> on 2016/02/26 03:49:10 UTC, 3 replies.
- [Help]: DataframeNAfunction fill method throwing exception - posted by Divya Gehlot <di...@gmail.com> on 2016/02/26 05:27:26 UTC, 2 replies.
- Re: spark-xml data source (com.databricks.spark.xml) not working with spark 1.6 - posted by Hyukjin Kwon <gu...@gmail.com> on 2016/02/26 05:45:27 UTC, 0 replies.
- merge join already sorted data? - posted by Ken Geis <ge...@gmail.com> on 2016/02/26 06:22:25 UTC, 1 replies.
- Survival Curves using AFT implementation in Spark - posted by Stuti Awasthi <st...@hcl.com> on 2016/02/26 07:35:06 UTC, 1 replies.
- When I merge some datas,can't go on... - posted by Bonsen <he...@126.com> on 2016/02/26 08:20:06 UTC, 1 replies.
- Is spark.driver.maxResultSize used correctly ? - posted by Jeff Zhang <zj...@gmail.com> on 2016/02/26 11:44:30 UTC, 2 replies.
- Dynamic allocation Spark - posted by alvarobrandon <al...@gmail.com> on 2016/02/26 12:14:30 UTC, 2 replies.
- Standalone vs. Mesos for production installation on a smallish cluster - posted by Petr Novak <os...@gmail.com> on 2016/02/26 12:40:45 UTC, 3 replies.
- Get all vertexes with outDegree equals to 0 with GraphX - posted by Guillermo Ortiz <ko...@gmail.com> on 2016/02/26 12:59:34 UTC, 5 replies.
- Java/Spark Library for interacting with Spark API - posted by Hans van den Bogert <ha...@gmail.com> on 2016/02/26 14:38:05 UTC, 1 replies.
- Task Output size in Spark WEB UI not the same as in HDFS - posted by alvarobrandon <al...@gmail.com> on 2016/02/26 14:59:49 UTC, 0 replies.
- kafka streaming topic partitions vs executors - posted by patcharee <Pa...@uni.no> on 2016/02/26 15:08:38 UTC, 1 replies.
- Hbase in spark - posted by Renu Yadav <yr...@gmail.com> on 2016/02/26 17:50:41 UTC, 2 replies.
- Mllib Logistic Regression performance relative to Mahout - posted by "raj.kumar" <ra...@hooklogic.com> on 2016/02/26 18:04:15 UTC, 1 replies.
- Clarification on RDD - posted by Ashok Kumar <as...@yahoo.com.INVALID> on 2016/02/26 18:40:48 UTC, 2 replies.
- Spark 1.5 on Mesos - posted by Ashish Soni <as...@gmail.com> on 2016/02/26 20:03:16 UTC, 6 replies.
- s3 access through proxy - posted by Joshua Buss <jo...@gmail.com> on 2016/02/26 20:29:46 UTC, 1 replies.
- Attempting to aggregate multiple values - posted by Daniel Imberman <da...@gmail.com> on 2016/02/26 21:29:02 UTC, 0 replies.
- Re: TaskCompletionListener and Exceptions - posted by Yin Yang <yy...@gmail.com> on 2016/02/27 00:52:38 UTC, 0 replies.
- Starting SPARK application in cluster mode from an IDE - posted by Gourav Sengupta <go...@gmail.com> on 2016/02/27 01:39:51 UTC, 0 replies.
- SparkML Using Pipeline API locally on driver - posted by Eugene Morozov <ev...@gmail.com> on 2016/02/27 01:52:07 UTC, 1 replies.
- Configure Spark Resource on AWS CLI Not Working - posted by Weiwei Zhang <wz...@dons.usfca.edu> on 2016/02/27 03:37:59 UTC, 0 replies.
- .cache() changes contents of RDD - posted by Yan Yang <ya...@wealthfront.com> on 2016/02/27 04:41:30 UTC, 3 replies.
- 2 tables join happens at Hive but not in spark - posted by Sandeep Khurana <sa...@infoworks.io> on 2016/02/27 11:10:35 UTC, 0 replies.
- Restrictions on SQL operations on Spark temporary tables - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/02/27 13:01:45 UTC, 1 replies.
- spark kafka receiver with different partition the consumer speed is unbanlance in one same executor - posted by "Triones,Deng(vip.com)" <tr...@vipshop.com> on 2016/02/27 14:56:10 UTC, 0 replies.
- deal with datas' structure - posted by Bonsen <he...@126.com> on 2016/02/27 15:15:07 UTC, 0 replies.
- Ordering two dimensional arrays of (String, Int) in the order of second element - posted by Ashok Kumar <as...@yahoo.com.INVALID> on 2016/02/27 19:25:45 UTC, 5 replies.
- Spark streaming not remembering previous state - posted by Vinti Maheshwari <vi...@gmail.com> on 2016/02/27 21:28:25 UTC, 4 replies.
- output the datas(txt) - posted by Bonsen <he...@126.com> on 2016/02/28 02:20:25 UTC, 2 replies.
- Spark Integration Patterns - posted by mms <mo...@gmail.com> on 2016/02/28 15:25:37 UTC, 13 replies.
- Recommendation for a good book on Spark, beginner to moderate knowledge - posted by Ashok Kumar <as...@yahoo.com.INVALID> on 2016/02/28 22:48:34 UTC, 12 replies.
- Question about MEOMORY_AND_DISK persistence - posted by Vishnu Viswanath <vi...@gmail.com> on 2016/02/29 00:09:32 UTC, 2 replies.
- Pattern Matching over a Sequence of rows using Spark - posted by Jerry Lam <ch...@gmail.com> on 2016/02/29 01:41:51 UTC, 0 replies.
- Error when trying to overwrite a partition dynamically in Spark SQL - posted by SRK <sw...@gmail.com> on 2016/02/29 03:09:30 UTC, 0 replies.
- a basic question on first use of PySpark shell and example, which is failing - posted by "Taylor, Ronald C" <Ro...@pnnl.gov> on 2016/02/29 06:36:49 UTC, 4 replies.
- Python in worker has different version 2.7 than that in driver 3.5, PySpark cannot run with different minor versions - posted by Hossein Vatani <vh...@yahoo.com> on 2016/02/29 07:05:07 UTC, 0 replies.
- [Error]: Spark 1.5.2 + HiveHbase Integration - posted by Divya Gehlot <di...@gmail.com> on 2016/02/29 10:48:01 UTC, 1 replies.
- Unresolved dep when building project with spark 1.6 - posted by Hao Ren <in...@gmail.com> on 2016/02/29 11:19:25 UTC, 1 replies.
- [Help]: Steps to access hive table + Spark 1.5.2 + HbaseIntegration + Hive 1.2 + Hbase 1.1 - posted by Divya Gehlot <di...@gmail.com> on 2016/02/29 11:36:25 UTC, 0 replies.
- What is the best approach to perform concurrent updates from different jobs to a in memory dataframe registered as a temp table? - posted by Roger Marin <ro...@rogersmarin.com> on 2016/02/29 11:42:35 UTC, 0 replies.
- spark lda runs out of disk space - posted by "TheGeorge1918 ." <zh...@gmail.com> on 2016/02/29 12:20:36 UTC, 0 replies.
- Deadlock between UnifiedMemoryManager and BlockManager - posted by Sea <26...@qq.com> on 2016/02/29 13:48:02 UTC, 0 replies.
- Implementation of random algorithm walk in spark - posted by naveenkumarmarri <na...@gmail.com> on 2016/02/29 13:56:58 UTC, 1 replies.
- Spark on Windows platform - posted by gaurav pathak <ga...@gmail.com> on 2016/02/29 14:27:45 UTC, 4 replies.
- kafka + mysql filtering problem - posted by franco barrientos <fr...@exalitica.com> on 2016/02/29 15:00:23 UTC, 1 replies.
- [MLlib] How to set Loss to Gradient Boosted Tree in Java - posted by diplomatic Guru <di...@gmail.com> on 2016/02/29 16:21:23 UTC, 8 replies.
- Flattening Data within DataFrames - posted by Kevin Mellott <ke...@gmail.com> on 2016/02/29 17:54:57 UTC, 2 replies.
- Optimizing cartesian product using keys - posted by eahlberg <ea...@gmail.com> on 2016/02/29 17:56:23 UTC, 0 replies.
- Spark for client - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/02/29 19:57:01 UTC, 3 replies.
- perl Kafka::Producer, “Kafka::Exception::Producer”, “code”, -1000, “message”, "Invalid argument - posted by Vinti Maheshwari <vi...@gmail.com> on 2016/02/29 22:26:18 UTC, 1 replies.