You are viewing a plain text version of this content. The canonical link for it is here.
- Re: data localisation in spark - posted by Sandy Ryza <sa...@cloudera.com> on 2015/06/01 00:11:45 UTC, 4 replies.
- Re: RDD staleness - posted by Michael Armbrust <mi...@databricks.com> on 2015/06/01 01:36:48 UTC, 0 replies.
- Re: Adding an indexed column - posted by ayan guha <gu...@gmail.com> on 2015/06/01 03:23:20 UTC, 1 replies.
- Re: Windowed Operations - posted by DMiner <ms...@outlook.com> on 2015/06/01 08:17:11 UTC, 0 replies.
- Create dataframe from saved objectfile RDD - posted by bipin <bi...@gmail.com> on 2015/06/01 08:27:42 UTC, 0 replies.
- Re: RDD boundaries and triggering processing using tags in the data - posted by Akhil Das <ak...@sigmoidanalytics.com> on 2015/06/01 09:36:22 UTC, 0 replies.
- Re: SparkSQL can't read S3 path for hive external table - posted by Akhil Das <ak...@sigmoidanalytics.com> on 2015/06/01 09:44:26 UTC, 2 replies.
- Re: Windows of windowed streams not displaying the expected results - posted by DMiner <ms...@outlook.com> on 2015/06/01 10:00:14 UTC, 0 replies.
- Don't understand "schedule jobs within an Application - posted by "bit1129@163.com" <bi...@163.com> on 2015/06/01 10:14:39 UTC, 1 replies.
- RE: Spark Executor Memory Usage - posted by "HuS.Andy" <hu...@hotmail.com> on 2015/06/01 10:41:50 UTC, 0 replies.
- Cassanda example - posted by Yasemin Kaya <go...@gmail.com> on 2015/06/01 10:48:24 UTC, 1 replies.
- RE: FW: Websphere MQ as a data source for Apache Spark Streaming - posted by "Chaudhary, Umesh" <Um...@searshc.com> on 2015/06/01 11:00:30 UTC, 0 replies.
- Re: Execption writing on two cassandra tables NoHostAvailableException: All host(s) tried for query failed (no host was tried) - posted by Helena Edelson <he...@datastax.com> on 2015/06/01 13:26:11 UTC, 3 replies.
- Spark stages very slow to complete - posted by Karlson <ks...@siberie.de> on 2015/06/01 16:52:50 UTC, 2 replies.
- Re: FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle - posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com on 2015/06/01 16:58:58 UTC, 0 replies.
- java.io.IOException: FAILED_TO_UNCOMPRESS(5) - posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com on 2015/06/01 17:00:26 UTC, 4 replies.
- Streaming K-medoids - posted by Marko Dinic <ma...@nissatech.com> on 2015/06/01 17:28:33 UTC, 2 replies.
- UNSUBSCRIBE - posted by "Rivera, Dario" <da...@amazon.com> on 2015/06/01 17:32:28 UTC, 0 replies.
- Event Logging to HDFS on Standalone Cluster "In Progress" - posted by Richard Marscher <rm...@localytics.com> on 2015/06/01 18:23:14 UTC, 2 replies.
- flatMap output on disk / flatMap memory overhead - posted by "octavian.ganea" <oc...@inf.ethz.ch> on 2015/06/01 19:32:17 UTC, 4 replies.
- using pyspark with standalone cluster - posted by AlexG <sw...@gmail.com> on 2015/06/01 20:18:00 UTC, 1 replies.
- RE: Anybody using Spark SQL JDBC server with DSE Cassandra? - posted by Mohammed Guller <mo...@glassbeam.com> on 2015/06/01 20:33:34 UTC, 2 replies.
- RE: Need some Cassandra integration help - posted by Mohammed Guller <mo...@glassbeam.com> on 2015/06/01 20:40:20 UTC, 0 replies.
- Re: Restricting the number of iterations in Mllib Kmeans - posted by Joseph Bradley <jo...@databricks.com> on 2015/06/01 21:02:05 UTC, 0 replies.
- RE: Migrate Relational to Distributed - posted by Mohammed Guller <mo...@glassbeam.com> on 2015/06/01 21:11:40 UTC, 0 replies.
- SparkSQL's performance gets degraded depending on number of partitions of Hive tables..is it normal? - posted by ogoh <ok...@gmail.com> on 2015/06/01 21:26:16 UTC, 1 replies.
- Re: union and reduceByKey wrong shuffle? - posted by Igor Berman <ig...@gmail.com> on 2015/06/01 21:31:25 UTC, 4 replies.
- Spark 1.3.1 On Mesos Issues. - posted by John Omernik <jo...@omernik.com> on 2015/06/01 21:49:18 UTC, 6 replies.
- Dataframe random permutation? - posted by Cesar Flores <ce...@gmail.com> on 2015/06/01 21:49:32 UTC, 1 replies.
- Re: PySpark with OpenCV causes python worker to crash - posted by Davies Liu <da...@databricks.com> on 2015/06/01 23:06:37 UTC, 6 replies.
- map - reduce only with disk - posted by "octavian.ganea" <oc...@inf.ethz.ch> on 2015/06/01 23:21:12 UTC, 3 replies.
- Spark 1.3.1 bundle does not build - unresolved dependency - posted by Stephen Boesch <ja...@gmail.com> on 2015/06/01 23:21:49 UTC, 1 replies.
- How to monitor Spark Streaming from Kafka? - posted by dgoldenberg <dg...@gmail.com> on 2015/06/01 23:23:13 UTC, 5 replies.
- Re: Spark updateStateByKey fails with class leak when using case classes - resend - posted by Tathagata Das <ta...@gmail.com> on 2015/06/01 23:56:57 UTC, 0 replies.
- HDFS Rest Service not available - posted by Su She <su...@gmail.com> on 2015/06/02 01:33:48 UTC, 2 replies.
- Re: Best strategy for Pandas -> Spark - posted by Davies Liu <da...@databricks.com> on 2015/06/02 02:38:32 UTC, 1 replies.
- Re: deos randomSplit return a copy or a reference to the original rdd? [Python] - posted by Davies Liu <da...@databricks.com> on 2015/06/02 02:40:36 UTC, 0 replies.
- Building Spark for Hadoop 2.6.0 - posted by Mulugeta Mammo <mu...@gmail.com> on 2015/06/02 02:51:33 UTC, 1 replies.
- SparkR installation on windows issue - posted by Daniel Emaasit <da...@gmail.com> on 2015/06/02 04:00:57 UTC, 1 replies.
- GroupBy on RDD returns empty collection - posted by Malte <ma...@gmail.com> on 2015/06/02 04:34:40 UTC, 1 replies.
- Spark 1.3.0: how to let Spark history load old records? - posted by Haopu Wang <HW...@qilinsoft.com> on 2015/06/02 05:36:34 UTC, 2 replies.
- Join highly skewed datasets - posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com on 2015/06/02 08:02:03 UTC, 27 replies.
- SparkSQL: How to specify replication factor on the persisted parquet files? - posted by Haopu Wang <HW...@qilinsoft.com> on 2015/06/02 08:28:57 UTC, 5 replies.
- Union data type - posted by "Agarwal, Shagun" <sh...@paypal.com.INVALID> on 2015/06/02 09:17:14 UTC, 1 replies.
- Insert overwrite to hive - ArrayIndexOutOfBoundsException - posted by patcharee <Pa...@uni.no> on 2015/06/02 09:23:55 UTC, 0 replies.
- spark sql - reading data from sql tables having space in column names - posted by Sachin Goyal <sa...@jabong.com> on 2015/06/02 10:43:51 UTC, 3 replies.
- Spark 1.4.0-rc3: Actor not found - posted by Anders Arpteg <ar...@spotify.com> on 2015/06/02 11:11:56 UTC, 4 replies.
- What is shuffle read and what is shuffle write ? - posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com on 2015/06/02 11:36:38 UTC, 1 replies.
- Shared / NFS filesystems - posted by Pradyumna Achar <pr...@gmail.com> on 2015/06/02 11:52:59 UTC, 1 replies.
- build jar with all dependencies - posted by Pa Rö <pa...@googlemail.com> on 2015/06/02 12:58:33 UTC, 4 replies.
- Compute Median in Spark Dataframe - posted by Olivier Girardot <o....@lateral-thoughts.com> on 2015/06/02 14:07:01 UTC, 8 replies.
- How to read sequence File. - posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com on 2015/06/02 14:33:43 UTC, 2 replies.
- Transactional guarantee while saving DataFrame into a DB - posted by Mohammad Tariq <do...@gmail.com> on 2015/06/02 17:01:06 UTC, 1 replies.
- Re: Re: spark 1.3.1 jars in repo1.maven.org - posted by Ryan Williams <ry...@gmail.com> on 2015/06/02 18:08:36 UTC, 3 replies.
- Cant figure out spark-sql errors - switching to Impala - sorry guys - posted by Sanjay Subramanian <sa...@yahoo.com.INVALID> on 2015/06/02 18:38:21 UTC, 0 replies.
- updateStateByKey and kafka direct approach without shuffle - posted by Krot Viacheslav <kr...@gmail.com> on 2015/06/02 18:50:38 UTC, 4 replies.
- Scala By the Bay + Big Data Scala 2015 Program Announced - posted by Alexy Khrabrov <al...@scalable.pro> on 2015/06/02 19:07:35 UTC, 0 replies.
- Embedding your own transformer in Spark.ml Pipleline - posted by dimple <di...@gmail.com> on 2015/06/02 19:19:22 UTC, 8 replies.
- [OFFTOPIC] Big Data Application Meetup - posted by Alex Baranau <al...@gmail.com> on 2015/06/02 20:15:04 UTC, 0 replies.
- Issues with Spark Streaming and Manual Clock used for Unit Tests - posted by mobsniuk <mo...@gmail.com> on 2015/06/02 20:47:04 UTC, 0 replies.
- Can't build Spark 1.3 - posted by "Yakubovich, Alexey" <Al...@searshc.com> on 2015/06/02 21:16:47 UTC, 2 replies.
- Re: IDE for sparkR - posted by Emaasit <da...@gmail.com> on 2015/06/02 21:33:21 UTC, 0 replies.
- DataFrames coming in SparkR in Apache Spark 1.4.0 - posted by Emaasit <da...@gmail.com> on 2015/06/02 21:38:56 UTC, 1 replies.
- Can't build Spark - posted by Mulugeta Mammo <mu...@gmail.com> on 2015/06/02 23:50:54 UTC, 3 replies.
- Behavior of the spark.streaming.kafka.maxRatePerPartition config param? - posted by dgoldenberg <dg...@gmail.com> on 2015/06/03 00:28:25 UTC, 1 replies.
- Scripting with groovy - posted by Paolo Platter <pa...@agilelab.it> on 2015/06/03 00:57:21 UTC, 1 replies.
- How to limit the total number of objects in a DStream maintained by updateStateByKey? - posted by frank_zhang <fr...@gmail.com> on 2015/06/03 01:17:57 UTC, 0 replies.
- Spark 1.4 & YARN Application Master fails with 500 connect refused - posted by Night Wolf <ni...@gmail.com> on 2015/06/03 02:29:54 UTC, 3 replies.
- Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing - posted by Ji ZHANG <zh...@gmail.com> on 2015/06/03 04:33:44 UTC, 6 replies.
- Re: in GraphX,program with Pregel runs slower and slower after several iterations - posted by Cheuk Lam <ch...@hotmail.com> on 2015/06/03 05:22:18 UTC, 1 replies.
- Application is "always" in process when I check out logs of completed application - posted by amghost <zh...@outlook.com> on 2015/06/03 06:04:51 UTC, 2 replies.
- Filter operation to return two RDDs at once. - posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com on 2015/06/03 07:27:51 UTC, 5 replies.
- How to create fewer output files for Spark job ? - posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com on 2015/06/03 07:54:09 UTC, 2 replies.
- Spark Client - posted by pavan kumar Kolamuri <pa...@gmail.com> on 2015/06/03 08:06:20 UTC, 6 replies.
- ERROR cluster.YarnScheduler: Lost executor - posted by patcharee <Pa...@uni.no> on 2015/06/03 08:31:15 UTC, 6 replies.
- MetaException(message:java.security.AccessControlException: Permission denied - posted by patcharee <Pa...@uni.no> on 2015/06/03 09:05:37 UTC, 0 replies.
- Make HTTP requests from within Spark - posted by kasparfischer <Ka...@dreizak.com> on 2015/06/03 09:49:03 UTC, 3 replies.
- Error: Building Spark 1.4.0 from Github-1.4 release branch - posted by Emaasit <da...@gmail.com> on 2015/06/03 11:19:28 UTC, 0 replies.
- Re: Spark 1.4.0 build Error on Windows - posted by Daniel Emaasit <da...@gmail.com> on 2015/06/03 11:36:35 UTC, 2 replies.
- run spark submit on cloudera cluster - posted by Pa Rö <pa...@googlemail.com> on 2015/06/03 13:28:47 UTC, 0 replies.
- ALS Rating Object - posted by Yasemin Kaya <go...@gmail.com> on 2015/06/03 14:04:32 UTC, 2 replies.
- Objects serialized before foreachRDD/foreachPartition ? - posted by dgoldenberg <dg...@gmail.com> on 2015/06/03 14:55:47 UTC, 4 replies.
- Equivalent to Storm's 'field grouping' in Spark. - posted by allonsy <lu...@gmail.com> on 2015/06/03 15:31:19 UTC, 2 replies.
- columnar structure of RDDs from Parquet or ORC files - posted by kiran lonikar <lo...@gmail.com> on 2015/06/03 16:40:50 UTC, 7 replies.
- Does Apache Spark maintain a columnar structure when creating RDDs from Parquet or ORC files? - posted by lonikar <lo...@gmail.com> on 2015/06/03 16:58:47 UTC, 1 replies.
- Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics? - posted by Dmitry Goldenberg <dg...@gmail.com> on 2015/06/03 17:14:29 UTC, 14 replies.
- Example Page Java Function2 - posted by "linkstar350 ." <tw...@gmail.com> on 2015/06/03 18:23:12 UTC, 2 replies.
- StreamingListener, anyone? - posted by dgoldenberg <dg...@gmail.com> on 2015/06/03 18:39:21 UTC, 3 replies.
- Re: Broadcast variables can be rebroadcast? - posted by NB <nb...@gmail.com> on 2015/06/03 18:40:27 UTC, 0 replies.
- Spark 1.4.0-rc4 HiveContext.table("db.tbl") NoSuchTableException - posted by Doug Balog <do...@dugos.com> on 2015/06/03 19:45:21 UTC, 5 replies.
- Managing spark processes via supervisord - posted by Mike Trienis <mi...@orcsol.com> on 2015/06/03 20:46:13 UTC, 3 replies.
- Python Image Library and Spark - posted by Justin Spargur <jm...@gmail.com> on 2015/06/03 22:27:49 UTC, 2 replies.
- Problem reading Parquet from 1.2 to 1.3 - posted by Don Drake <do...@gmail.com> on 2015/06/03 22:39:37 UTC, 4 replies.
- [ANNOUNCE] YARN support in Spark EC2 - posted by Shivaram Venkataraman <sh...@eecs.berkeley.edu> on 2015/06/04 01:32:53 UTC, 0 replies.
- RandomForest - subsamplingRate parameter - posted by Andrew Leverentz <An...@fico.com> on 2015/06/04 01:40:14 UTC, 1 replies.
- Standard Scaler taking 1.5hrs - posted by Piero Cinquegrana <pc...@marketshare.com> on 2015/06/04 01:53:32 UTC, 0 replies.
- How to pass system properties in spark ? - posted by Ashwin Shankar <as...@gmail.com> on 2015/06/04 02:38:18 UTC, 0 replies.
- Adding new Spark workers on AWS EC2 - access error - posted by barmaley <ol...@solver.com> on 2015/06/04 03:15:43 UTC, 2 replies.
- importerror using external library with pyspark - posted by AlexG <sw...@gmail.com> on 2015/06/04 03:24:27 UTC, 1 replies.
- Re: Spark Cluster Benchmarking Frameworks - posted by Zhen Jia <ji...@ict.ac.cn> on 2015/06/04 03:43:39 UTC, 0 replies.
- TreeReduce Functionality in Spark - posted by raggy <ra...@gmail.com> on 2015/06/04 04:10:58 UTC, 5 replies.
- Spark 1.4 HiveContext fails to initialise with native libs - posted by Night Wolf <ni...@gmail.com> on 2015/06/04 06:38:36 UTC, 3 replies.
- Re: Standard Scaler taking 1.5hrs - posted by DB Tsai <db...@dbtsai.com> on 2015/06/04 06:53:54 UTC, 4 replies.
- NullPointerException SQLConf.setConf - posted by patcharee <Pa...@uni.no> on 2015/06/04 09:45:50 UTC, 1 replies.
- Spark ML decision list - posted by Sateesh Kavuri <sa...@gmail.com> on 2015/06/04 11:14:32 UTC, 3 replies.
- Difference bewteen library dependencies version - posted by Jean-Charles RISCH <ri...@gmail.com> on 2015/06/04 11:53:51 UTC, 1 replies.
- inlcudePackage() deprecated? - posted by Daniel Emaasit <da...@gmail.com> on 2015/06/04 12:10:16 UTC, 2 replies.
- SparkSQL DF.explode with Nulls - posted by Tom Seddon <mr...@gmail.com> on 2015/06/04 13:05:48 UTC, 1 replies.
- large shuffling => executor lost? - posted by Yifan LI <ia...@gmail.com> on 2015/06/04 13:24:41 UTC, 0 replies.
- Optimisation advice for Avro->Parquet merge job - posted by James Aley <ja...@swiftkey.com> on 2015/06/04 15:29:16 UTC, 5 replies.
- Big performance difference when joining 3 tables in different order - posted by Hao Ren <in...@gmail.com> on 2015/06/04 16:10:59 UTC, 0 replies.
- Scaling spark jobs returning large amount of data - posted by Giuseppe Sarno <Gi...@fico.com> on 2015/06/04 16:30:09 UTC, 2 replies.
- Spark Job always cause a node to reboot - posted by Chao Chen <ka...@gmail.com> on 2015/06/04 16:59:18 UTC, 2 replies.
- How to speed up Spark Job? - posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com on 2015/06/04 17:04:56 UTC, 0 replies.
- Setting S3 output file grantees for spark output files - posted by Justin Steigel <js...@gmail.com> on 2015/06/04 17:10:11 UTC, 2 replies.
- Deduping events using Spark - posted by lbierman <le...@gmail.com> on 2015/06/04 19:10:21 UTC, 2 replies.
- How to run spark streaming application on YARN? - posted by Saiph Kappa <sa...@gmail.com> on 2015/06/04 19:20:07 UTC, 7 replies.
- TF-IDF Question - posted by franco barrientos <fr...@exalitica.com> on 2015/06/04 19:46:37 UTC, 1 replies.
- Re: Required settings for permanent HDFS Spark on EC2 - posted by barmaley <ol...@solver.com> on 2015/06/04 20:46:01 UTC, 1 replies.
- Re: Roadmap for Spark with Kafka on Scala 2.11? - posted by algermissen1971 <al...@icloud.com> on 2015/06/04 22:24:19 UTC, 1 replies.
- Error running sbt package on Windows 7 for Spark 1.3.1 and SimpleApp.scala - posted by Joseph Washington <br...@gmail.com> on 2015/06/04 22:55:54 UTC, 0 replies.
- How to share large resources like dictionaries while processing data with Spark ? - posted by dgoldenberg <dg...@gmail.com> on 2015/06/04 23:49:46 UTC, 11 replies.
- sqlCtx.load a single big csv file from s3 in parallel - posted by gy8 <ga...@gmail.com> on 2015/06/05 00:54:29 UTC, 0 replies.
- SparkSQL : using Hive UDF returning Map throws "rror: scala.MatchError: interface java.util.Map (of class java.lang.Class) (state=,code=0)" - posted by ogoh <ok...@gmail.com> on 2015/06/05 04:10:25 UTC, 4 replies.
- Column operation on Spark RDDs. - posted by Carter <gy...@hotmail.com> on 2015/06/05 04:45:56 UTC, 2 replies.
- Why the default Params.copy doesn't work for Model.copy? - posted by Justin Yip <yi...@prediction.io> on 2015/06/05 07:09:58 UTC, 1 replies.
- Spark terminology - posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com on 2015/06/05 07:19:28 UTC, 0 replies.
- FetchFailed Exception - posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com on 2015/06/05 07:25:26 UTC, 1 replies.
- Error when job submitting to Rest URL of master - posted by pavan kumar Kolamuri <pa...@gmail.com> on 2015/06/05 08:56:59 UTC, 0 replies.
- Avro or Parquet ? - posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com on 2015/06/05 09:00:04 UTC, 1 replies.
- Access several s3 buckets, with credentials containing "/" - posted by Pierre B <pi...@realimpactanalytics.com> on 2015/06/05 09:02:50 UTC, 3 replies.
- How to increase the number of tasks - posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com on 2015/06/05 11:48:14 UTC, 7 replies.
- Articles related with how spark handles spark components(Driver,Worker,Executor, Task) failure - posted by "bit1129@163.com" <bi...@163.com> on 2015/06/05 11:50:17 UTC, 0 replies.
- Saving calculation to single local file - posted by marcos rebelo <ol...@gmail.com> on 2015/06/05 12:16:29 UTC, 4 replies.
- Nested reduceByKeyAndWindow losing data - posted by Alexander Krasheninnikov <a....@corp.badoo.com> on 2015/06/05 13:25:47 UTC, 0 replies.
- Slow file listing when loading records from in S3 without filename or wildcard - posted by Ewan Leith <ew...@realitymine.com> on 2015/06/05 15:17:40 UTC, 0 replies.
- Accumulator map - posted by Cosmin Cătălin Sanda <co...@gmail.com> on 2015/06/05 15:36:06 UTC, 1 replies.
- Override Logging with spark-streaming - posted by ni...@free.fr on 2015/06/05 15:56:52 UTC, 1 replies.
- redshift spark - posted by Hafiz Mujadid <ha...@gmail.com> on 2015/06/05 16:25:27 UTC, 2 replies.
- Cassandra Submit - posted by Yasemin Kaya <go...@gmail.com> on 2015/06/05 16:31:21 UTC, 17 replies.
- Spark Streaming for Each RDD - Exception on Empty - posted by John Omernik <jo...@omernik.com> on 2015/06/05 17:08:13 UTC, 3 replies.
- Saving compressed textFiles from a DStream in Scala - posted by doki_pen <rc...@gmail.com> on 2015/06/05 17:40:27 UTC, 3 replies.
- Spark SQL and Streaming Results - posted by Pietro Gentile <pi...@gmail.com> on 2015/06/05 17:41:27 UTC, 3 replies.
- Shuffle strange error - posted by "octavian.ganea" <oc...@inf.ethz.ch> on 2015/06/05 18:47:08 UTC, 1 replies.
- Re: Pregel runs slower and slower when each Pregel has data dependency - posted by dash <bs...@nd.edu> on 2015/06/05 20:05:41 UTC, 0 replies.
- Removing Keys from a MapType - posted by chrish2312 <ch...@palantir.com> on 2015/06/05 20:11:33 UTC, 0 replies.
- SparkContext & Threading - posted by Lee McFadden <sp...@gmail.com> on 2015/06/05 20:48:34 UTC, 14 replies.
- Job aborted - posted by Giovanni Paolo Gibilisco <gi...@gmail.com> on 2015/06/05 21:55:24 UTC, 1 replies.
- Multi-node Docker based *Spark 1.3.1* clusters on VirtualBox(Mac)/EC2 instance - posted by Anant Chintamaneni <an...@gmail.com> on 2015/06/05 23:02:29 UTC, 0 replies.
- Re: Loading CSV to DataFrame and saving it into Parquet for speedup - posted by Hossein <fa...@gmail.com> on 2015/06/06 00:35:48 UTC, 0 replies.
- Re: Can you specify partitions? - posted by amghost <zh...@outlook.com> on 2015/06/06 01:08:13 UTC, 0 replies.
- Running SparkSql against Hive tables - posted by James Pirz <ja...@gmail.com> on 2015/06/06 03:06:20 UTC, 7 replies.
- which database for gene alignment data ? - posted by roni <ro...@gmail.com> on 2015/06/06 08:42:43 UTC, 5 replies.
- Filling Parquet files by values in Value of a JavaPairRDD - posted by Mohamed Nadjib Mami <ma...@iai.uni-bonn.de> on 2015/06/06 09:43:40 UTC, 0 replies.
- RE: Problem getting program to run on 15TB input - posted by Kapil Malik <km...@adobe.com> on 2015/06/06 09:50:05 UTC, 1 replies.
- Which class takes place of BlockManagerWorker in Spark 1.3.1 - posted by "bit1129@163.com" <bi...@163.com> on 2015/06/06 11:23:51 UTC, 1 replies.
- Logging in spark-shell on master - posted by Robert Pond <ro...@gmail.com> on 2015/06/06 15:38:57 UTC, 0 replies.
- write multiple outputs by key - posted by patcharee <Pa...@uni.no> on 2015/06/06 17:07:46 UTC, 1 replies.
- Spark Streaming Stuck After 10mins Issue... - posted by EH <ea...@gmail.com> on 2015/06/06 22:03:16 UTC, 4 replies.
- hiveContext.sql NullPointerException - posted by patcharee <Pa...@uni.no> on 2015/06/07 04:06:48 UTC, 6 replies.
- Don't understand the numbers on the Storage UI(/storage/rdd/?id=4) - posted by "bit1129@163.com" <bi...@163.com> on 2015/06/07 05:22:18 UTC, 0 replies.
- Optimization module in Python mllib - posted by martingoodson <ma...@gmail.com> on 2015/06/07 10:42:04 UTC, 1 replies.
- Monitoring Spark Jobs - posted by SamyaMaiti <sa...@gmail.com> on 2015/06/07 12:37:25 UTC, 3 replies.
- Re: Caching parquet table (with GZIP) on Spark 1.3.1 - posted by Cheng Lian <li...@gmail.com> on 2015/06/07 16:25:54 UTC, 0 replies.
- Re: Not understanding manually building EC2 cluster - posted by Akhil <ak...@sigmoidanalytics.com> on 2015/06/07 18:36:39 UTC, 0 replies.
- Driver crash at the end with InvocationTargetException when running SparkPi - posted by Dong Lei <do...@microsoft.com> on 2015/06/08 05:31:06 UTC, 2 replies.
- Examples of flatMap in dataFrame - posted by Dimp Bhat <di...@gmail.com> on 2015/06/08 06:22:37 UTC, 1 replies.
- FlatMap in DataFrame - posted by dimple <di...@gmail.com> on 2015/06/08 06:36:56 UTC, 0 replies.
- Good Spark consultants? - posted by jakeheller <ja...@casetext.com> on 2015/06/08 08:06:35 UTC, 0 replies.
- How to obtain ActorSystem and/or ActorFlowMaterializer in updateStateByKey - posted by algermissen1971 <al...@icloud.com> on 2015/06/08 08:29:16 UTC, 0 replies.
- Jobs aborted due to EventLoggingListener Filesystem closed - posted by "igor.berman" <ig...@gmail.com> on 2015/06/08 10:43:40 UTC, 1 replies.
- spark ssh to slave - posted by James King <ja...@gmail.com> on 2015/06/08 11:21:44 UTC, 3 replies.
- Official Mllib API does not correspond to auto completion - posted by Jean-Charles RISCH <ri...@gmail.com> on 2015/06/08 11:26:28 UTC, 1 replies.
- Error in using saveAsParquetFile - posted by bipin <bi...@gmail.com> on 2015/06/08 11:29:43 UTC, 6 replies.
- path to hdfs - posted by Pa Rö <pa...@googlemail.com> on 2015/06/08 12:45:16 UTC, 2 replies.
- How to decrease the time of storing block in memory - posted by lu...@sina.com on 2015/06/08 14:15:30 UTC, 3 replies.
- coGroup on RDD - posted by elbehery <el...@gmail.com> on 2015/06/08 14:50:42 UTC, 1 replies.
- SparkSQL nested dictionaries - posted by mrm <ma...@skimlinks.com> on 2015/06/08 15:00:26 UTC, 1 replies.
- Transform Functions and Python Modules - posted by John Omernik <jo...@omernik.com> on 2015/06/08 15:27:41 UTC, 0 replies.
- spark timesout maybe due to binaryFiles() with more than 1 million files in HDFS - posted by Kostas Kougios <ko...@googlemail.com> on 2015/06/08 16:01:43 UTC, 5 replies.
- unsubscribe - posted by Ricardo Goncalves da Silva <ri...@telefonica.com> on 2015/06/08 16:50:21 UTC, 1 replies.
- FileOutputCommitter deadlock 1.3.1 - posted by Richard Marscher <rm...@localytics.com> on 2015/06/08 16:55:09 UTC, 1 replies.
- Create multiple rows from elements in array on a single row - posted by Bill Q <bi...@gmail.com> on 2015/06/08 19:57:47 UTC, 1 replies.
- k-means for text mining in a streaming context - posted by Ruslan Dautkhanov <da...@gmail.com> on 2015/06/08 20:05:37 UTC, 1 replies.
- RDD of RDDs - posted by ping yan <sh...@gmail.com> on 2015/06/08 22:55:53 UTC, 5 replies.
- [Kafka-Spark-Consumer] Spark-Streaming Job Fails due to Futures timed out - posted by Snehal Nagmote <na...@gmail.com> on 2015/06/09 01:44:18 UTC, 3 replies.
- spark eventLog and history server - posted by Du Li <li...@yahoo-inc.com.INVALID> on 2015/06/09 01:57:20 UTC, 1 replies.
- Re: Wired Problem: Task not serializable[Spark Streaming] - posted by "bit1129@163.com" <bi...@163.com> on 2015/06/09 04:01:42 UTC, 1 replies.
- Re: How does lineage get passed down in RDDs - posted by maxdml <ma...@cs.duke.edu> on 2015/06/09 04:11:06 UTC, 0 replies.
- Running SparkPi ( or JavaWordCount) example fails with "Job aborted due to stage failure: Task serialization failed" - posted by Elkhan Dadashov <el...@gmail.com> on 2015/06/09 04:13:51 UTC, 0 replies.
- Spark Python with SequenceFile containing numpy deserialized data in str form - posted by Sam Stoelinga <sa...@gmail.com> on 2015/06/09 05:04:44 UTC, 2 replies.
- ClassNotDefException when using spark-submit with multiple jars and files located on HDFS - posted by Dong Lei <do...@microsoft.com> on 2015/06/09 05:35:31 UTC, 7 replies.
- Spark SQL with Thrift Server is very very slow and finally failing - posted by Sourav Mazumder <so...@gmail.com> on 2015/06/09 05:52:01 UTC, 7 replies.
- Spark error "value join is not a member of org.apache.spark.rdd.RDD[((String, String), String, String)]" - posted by amit tewari <am...@gmail.com> on 2015/06/09 06:44:01 UTC, 8 replies.
- Spark compilation issue on intellij - posted by canan chen <cc...@gmail.com> on 2015/06/09 07:19:41 UTC, 0 replies.
- Different Sorting RDD methods in Apache Spark - posted by raggy <ra...@gmail.com> on 2015/06/09 08:30:23 UTC, 4 replies.
- [SparkStreaming 1.3.0] Broadcast failure after setting "spark.cleaner.ttl" - posted by Haopu Wang <HW...@qilinsoft.com> on 2015/06/09 09:30:20 UTC, 4 replies.
- Re: Rdd of Rdds - posted by lonikar <lo...@gmail.com> on 2015/06/09 10:32:22 UTC, 0 replies.
- 回复:Re: How to decrease the time of storing block in memory - posted by lu...@sina.com on 2015/06/09 10:39:26 UTC, 0 replies.
- Apache Phoenix (4.3.1 and 4.4.0-HBase-0.98) on Spark 1.3.1 ClassNotFoundException - posted by Jeroen Vlek <j....@anchormen.nl> on 2015/06/09 10:40:13 UTC, 5 replies.
- 回复:Re: Re: How to decrease the time of storing block in memory - posted by lu...@sina.com on 2015/06/09 11:30:07 UTC, 0 replies.
- Re: How to keep a SQLContext instance alive in a spark streaming application's life cycle? - posted by drarse <dr...@gmail.com> on 2015/06/09 11:41:42 UTC, 2 replies.
- BigDecimal problem in parquet file - posted by bipin <bi...@gmail.com> on 2015/06/09 14:18:01 UTC, 8 replies.
- Spark 1.3.1 SparkSQL metastore exceptions - posted by "Needham, Guy" <Gu...@virginmedia.co.uk> on 2015/06/09 14:28:37 UTC, 4 replies.
- append file on hdfs - posted by Pa Rö <pa...@googlemail.com> on 2015/06/09 15:34:28 UTC, 2 replies.
- Join between DStream and Periodically-Changing-RDD - posted by Ilove Data <da...@gmail.com> on 2015/06/09 16:07:51 UTC, 6 replies.
- Issue running Spark 1.4 on Yarn - posted by Matt Kapilevich <ma...@gmail.com> on 2015/06/09 16:56:07 UTC, 15 replies.
- traverse a graph based on edge properties whilst counting matching vertex attributes - posted by MA2 <aa...@gmail.com> on 2015/06/09 17:29:35 UTC, 0 replies.
- Costs of transformations - posted by Vijayasarathy Kannan <kv...@vt.edu> on 2015/06/09 17:43:03 UTC, 0 replies.
- Implementing top() using treeReduce() - posted by raggy <ra...@gmail.com> on 2015/06/09 19:09:19 UTC, 5 replies.
- Kafka Spark Streaming: ERROR EndpointWriter: dropping message - posted by karma243 <as...@reducedata.com> on 2015/06/09 19:43:41 UTC, 3 replies.
- [SPARK-6330] 1.4.0/1.5.0 Bug to access S3 -- AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively). - posted by Shuai Zheng <sz...@gmail.com> on 2015/06/09 21:45:42 UTC, 0 replies.
- spark on yarn - posted by Neera <ne...@gmail.com> on 2015/06/09 22:46:11 UTC, 0 replies.
- Linear Regression with SGD - posted by Stephen Carman <sc...@coldlight.com> on 2015/06/09 23:05:54 UTC, 3 replies.
- Re: spark-submit working differently than pyspark when trying to find external jars - posted by Walt Schlender <wa...@hired.com> on 2015/06/09 23:56:19 UTC, 0 replies.
- Can a Spark App run with spark-submit write pdf files to HDFS - posted by Richard Catlin <ri...@gmail.com> on 2015/06/09 23:57:43 UTC, 2 replies.
- Re: Determining number of executors within RDD - posted by maxdml <ma...@cs.duke.edu> on 2015/06/10 01:04:40 UTC, 7 replies.
- how to clear state in Spark Streaming based on emitting - posted by Robert Towne <Ro...@WebTrends.com> on 2015/06/10 02:36:56 UTC, 0 replies.
- spark-submit does not use hive-site.xml - posted by James Pirz <ja...@gmail.com> on 2015/06/10 04:19:19 UTC, 3 replies.
- Spark's Scala shell killing itself - posted by Chandrashekhar Kotekar <sh...@gmail.com> on 2015/06/10 07:34:05 UTC, 1 replies.
- How to use Apache spark mllib Model output in C++ component - posted by mahesht <ma...@gmail.com> on 2015/06/10 08:20:30 UTC, 2 replies.
- how to maintain huge dataset while using spark streaming - posted by homar <ko...@gmail.com> on 2015/06/10 09:53:06 UTC, 0 replies.
- Re: Met OOM when fetching more than 1,000,000 rows. - posted by Cheng Lian <li...@databricks.com> on 2015/06/10 10:11:35 UTC, 0 replies.
- Re: 回复: Re: Met OOM when fetching more than 1,000,000 rows. - posted by Cheng Lian <li...@databricks.com> on 2015/06/10 10:33:17 UTC, 1 replies.
- 回复:Re: Re: Re: How to decrease the time of storing block in memory - posted by lu...@sina.com on 2015/06/10 10:36:30 UTC, 0 replies.
- DataFrame.save with SaveMode.Overwrite produces 3x higher data size - posted by bkapukaranov <b....@gmail.com> on 2015/06/10 11:22:56 UTC, 1 replies.
- cannot access port 4040 - posted by mrm <ma...@skimlinks.com> on 2015/06/10 12:21:12 UTC, 6 replies.
- Re: 回复: Re: 回复: Re: Met OOM when fetching more than 1,000,000 rows. - posted by Cheng Lian <li...@databricks.com> on 2015/06/10 13:55:14 UTC, 0 replies.
- spark uses too much memory maybe (binaryFiles() with more than 1 million files in HDFS), groupBy or reduceByKey() - posted by Kostas Kougios <ko...@googlemail.com> on 2015/06/10 14:24:34 UTC, 7 replies.
- [Spark 1.3.1 on YARN on EMR] Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient - posted by Roberto Coluccio <ro...@gmail.com> on 2015/06/10 14:32:04 UTC, 3 replies.
- learning rpc about spark core source code - posted by huangzheng <11...@qq.com> on 2015/06/10 14:33:30 UTC, 0 replies.
- Split RDD based on criteria - posted by dgoldenberg <dg...@gmail.com> on 2015/06/10 14:56:12 UTC, 2 replies.
- Re: learning rpc about spark core source code - posted by Shixiong Zhu <zs...@gmail.com> on 2015/06/10 15:04:09 UTC, 0 replies.
- Spark standalone mode and kerberized cluster - posted by kazeborja <ka...@gmail.com> on 2015/06/10 15:19:41 UTC, 5 replies.
- Re: PostgreSQL JDBC Classpath Issue - posted by shahab <sh...@gmail.com> on 2015/06/10 15:24:38 UTC, 1 replies.
- Fully in-memory shuffles - posted by Corey Nolet <cj...@gmail.com> on 2015/06/10 16:08:10 UTC, 7 replies.
- PYTHONPATH on worker nodes - posted by Bob Corsaro <rc...@gmail.com> on 2015/06/10 17:15:49 UTC, 1 replies.
- Spark not working on windows 7 64 bit - posted by Eran Medan <er...@gmail.com> on 2015/06/10 17:16:39 UTC, 1 replies.
- spark streaming - checkpointing - looking at old application directory and failure to start streaming context - posted by Ashish Nigam <as...@gmail.com> on 2015/06/10 18:14:13 UTC, 3 replies.
- How to build spark with Hive 1.x ? - posted by Neal Yin <ne...@workday.com> on 2015/06/10 18:16:56 UTC, 1 replies.
- Re: Spark Maven Test error - posted by Rick Moritz <ra...@gmail.com> on 2015/06/10 18:39:33 UTC, 0 replies.
- RE: [SPARK-6330] 1.4.0/1.5.0 Bug to access S3 -- AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively) - posted by Shuai Zheng <sz...@gmail.com> on 2015/06/10 20:38:02 UTC, 0 replies.
- Problem with pyspark on Docker talking to YARN cluster - posted by Ashwin Shankar <as...@gmail.com> on 2015/06/10 22:43:04 UTC, 1 replies.
- Re: Efficient way to get top K values per key in (key, value) RDD? - posted by erisa <er...@gmail.com> on 2015/06/10 22:53:50 UTC, 1 replies.
- Hive Custom Transform Scripts (read from stdin and print to stdout) in Spark - posted by nishanthps <ni...@gmail.com> on 2015/06/10 23:02:36 UTC, 0 replies.
- Re: How to set KryoRegistrator class in spark-shell - posted by bhomass <bh...@gmail.com> on 2015/06/11 00:16:07 UTC, 2 replies.
- Can't access Ganglia on EC2 Spark cluster - posted by barmaley <ol...@solver.com> on 2015/06/11 00:16:24 UTC, 1 replies.
- NullPointerException with functions.rand() - posted by Justin Yip <yi...@prediction.io> on 2015/06/11 03:15:02 UTC, 2 replies.
- how to deal with continued records - posted by Zhang Jiaqiang <zh...@gmail.com> on 2015/06/11 08:58:11 UTC, 2 replies.
- How many APPs does spark support to running simultaneously in one cluster? - posted by lu...@sina.com on 2015/06/11 09:33:32 UTC, 0 replies.
- How many APPs does spark support to run simultaneously in one cluster? - posted by lu...@sina.com on 2015/06/11 09:52:05 UTC, 0 replies.
- How to run scala script in Datastax Spark distribution? - posted by amit tewari <am...@gmail.com> on 2015/06/11 11:02:18 UTC, 0 replies.
- (Unknown) - posted by "Wangfei (X)" <wa...@huawei.com> on 2015/06/11 11:33:30 UTC, 2 replies.
- spark sql insert into table performance issue - posted by "Wangfei (X)" <wa...@huawei.com> on 2015/06/11 11:34:33 UTC, 0 replies.
- Re: Is there a way to limit the sql query result size? - posted by neeravsalaria <ne...@gmail.com> on 2015/06/11 11:50:41 UTC, 0 replies.
- Spark on Mesos fine-grained - has one core less per executor - posted by hbogert <ha...@gmail.com> on 2015/06/11 13:47:51 UTC, 0 replies.
- How to pass arguments dynamically, that needs to be used in executors - posted by gaurav sharma <sh...@gmail.com> on 2015/06/11 14:23:26 UTC, 2 replies.
- Reopen Jira or New Jira - posted by John Omernik <jo...@omernik.com> on 2015/06/11 14:34:40 UTC, 1 replies.
- ClassCastException: BlockManagerId cannot be cast to [B - posted by davidkl <da...@hotmail.com> on 2015/06/11 14:55:06 UTC, 1 replies.
- Reading Really Big File Stream from HDFS - posted by SLiZn Liu <sl...@gmail.com> on 2015/06/11 15:07:45 UTC, 3 replies.
- Reading file from S3, facing java.lang.NoClassDefFoundError: org/jets3t/service/ServiceException - posted by shahab <sh...@gmail.com> on 2015/06/11 15:14:38 UTC, 2 replies.
- Could Spark batch processing live within Spark Streaming? - posted by diplomatic Guru <di...@gmail.com> on 2015/06/11 15:24:56 UTC, 1 replies.
- spark stream and spark sql with data warehouse - posted by 唐思成 <ja...@qq.com> on 2015/06/11 16:47:55 UTC, 2 replies.
- How to deal with an explosive flatmap - posted by matthewrj <ml...@gmail.com> on 2015/06/11 16:57:02 UTC, 1 replies.
- spark-sql from CLI --->EXCEPTION: java.lang.OutOfMemoryError: Java heap space - posted by Sanjay Subramanian <sa...@yahoo.com.INVALID> on 2015/06/11 17:43:27 UTC, 6 replies.
- ReduceByKey with a byte array as the key - posted by Mark Tse <Ma...@D2L.com> on 2015/06/11 17:57:44 UTC, 3 replies.
- Re: Running Spark in Local Mode - posted by mrm <ma...@skimlinks.com> on 2015/06/11 18:03:55 UTC, 0 replies.
- [ANNOUNCE] Announcing Spark 1.4 - posted by Patrick Wendell <pw...@gmail.com> on 2015/06/11 18:05:06 UTC, 0 replies.
- Limit Spark Shuffle Disk Usage - posted by Al M <al...@gmail.com> on 2015/06/11 18:16:12 UTC, 4 replies.
- takeSample() results in two stages - posted by barmaley <ol...@solver.com> on 2015/06/11 18:43:13 UTC, 1 replies.
- DataFrames for non-SQL computation? - posted by Tom Hubregtsen <th...@gmail.com> on 2015/06/11 19:08:57 UTC, 1 replies.
- Re: spark-1.2.0--standalone-ha-zookeeper - posted by scar0909 <sc...@gmail.com> on 2015/06/11 19:23:16 UTC, 0 replies.
- Re: Shutdown with streaming driver running in cluster broke master web UI permanently - posted by scar0909 <sc...@gmail.com> on 2015/06/11 20:24:14 UTC, 4 replies.
- Spark distinct() returns incorrect results for some types? - posted by Crystal Xing <cr...@gmail.com> on 2015/06/11 20:36:21 UTC, 5 replies.
- how to use a properties file from a url in spark-submit - posted by Gary Ogden <go...@gmail.com> on 2015/06/11 21:48:50 UTC, 5 replies.
- How to set spark master URL to contain domain name? - posted by "Wang, Ningjun (LNG-NPV)" <ni...@lexisnexis.com> on 2015/06/11 22:01:52 UTC, 2 replies.
- Re: UDF accessing hive struct array fails with buffer underflow from kryo - posted by Yutong Luo <yl...@groupon.com> on 2015/06/11 23:25:58 UTC, 0 replies.
- Deleting HDFS files from Pyspark - posted by Siegfried Bilstein <sb...@gmail.com> on 2015/06/12 00:56:56 UTC, 1 replies.
- Does MLLib has attribute importance? - posted by Ruslan Dautkhanov <da...@gmail.com> on 2015/06/12 01:33:39 UTC, 5 replies.
- Is it possible to see Spark jobs on MapReduce job history ? (running Spark on YARN cluster) - posted by Elkhan Dadashov <el...@gmail.com> on 2015/06/12 04:01:22 UTC, 1 replies.
- Spark Streaming reads from stdin or output from command line utility - posted by foobar <he...@fb.com> on 2015/06/12 04:25:53 UTC, 5 replies.
- Re: SparkSQL JDBC Datasources API when running on YARN - Spark 1.3.0 - posted by Sathish Kumaran Vairavelu <vs...@gmail.com> on 2015/06/12 04:28:22 UTC, 0 replies.
- Error using json4s with Apache Spark in spark-shell - posted by Daniel Mahler <dm...@gmail.com> on 2015/06/12 06:08:58 UTC, 0 replies.
- Spark 1.4: Python API for getting Kafka offsets in direct mode? - posted by Amit Ramesh <am...@yelp.com> on 2015/06/12 06:36:45 UTC, 9 replies.
- Re: 回复: Re: 回复: Re: 回复: Re: Met OOM when fetching more than 1,000,000 rows. - posted by Cheng Lian <li...@databricks.com> on 2015/06/12 09:25:58 UTC, 0 replies.
- Optimizing Streaming from Websphere MQ - posted by "Chaudhary, Umesh" <Um...@searshc.com> on 2015/06/12 09:28:43 UTC, 0 replies.
- Re: 回复: Re: 回复: Re: 回复: Re: 回复: Re: Met OOM when fetching more than 1,000,000 rows. - posted by Cheng Lian <li...@databricks.com> on 2015/06/12 09:51:13 UTC, 3 replies.
- How to use Window Operations with kafka Direct-API? - posted by ZIGEN <db...@gmail.com> on 2015/06/12 10:10:44 UTC, 3 replies.
- Cannot start master with Spark 1.4.0 - posted by Alexis Seigneurin <as...@ippon.fr> on 2015/06/12 10:34:11 UTC, 5 replies.
- If not stop StreamingContext gracefully, will checkpoint data be consistent? - posted by Haopu Wang <HW...@qilinsoft.com> on 2015/06/12 10:57:22 UTC, 6 replies.
- Spark 1.4 release date - posted by ayan guha <gu...@gmail.com> on 2015/06/12 11:41:54 UTC, 5 replies.
- Upgrade to parquet 1.6.0 - posted by Eric Eijkelenboom <er...@gmail.com> on 2015/06/12 12:16:32 UTC, 2 replies.
- Scheduling and node affinity - posted by Brian Candler <b....@pobox.com> on 2015/06/12 12:53:38 UTC, 0 replies.
- Broadcast value - posted by Yasemin Kaya <go...@gmail.com> on 2015/06/12 15:27:56 UTC, 0 replies.
- Writing data to hbase using Sparkstreaming - posted by Vamshi Krishna <va...@gmail.com> on 2015/06/12 15:40:08 UTC, 0 replies.
- Fwd: Spark/PySpark errors on mysterious missing /tmp file - posted by John Berryman <jo...@eventbrite.com> on 2015/06/12 15:50:33 UTC, 0 replies.
- Spark Streaming - Can i BIND Spark Executor to Kafka Partition Leader - posted by gaurav sharma <sh...@gmail.com> on 2015/06/12 16:59:11 UTC, 0 replies.
- Issues with `when` in Column class - posted by Chris Freeman <cf...@alteryx.com> on 2015/06/12 17:05:12 UTC, 2 replies.
- Spark Streaming, updateStateByKey and mapPartitions() - and lazy "DatabaseConnection" - posted by algermissen1971 <al...@icloud.com> on 2015/06/12 17:07:08 UTC, 6 replies.
- [Spark-1.4.0]jackson-databind conflict? - posted by Earthson <Ea...@gmail.com> on 2015/06/12 17:20:44 UTC, 2 replies.
- Apache Spark architecture - posted by Vitalii Duk <vi...@perfectial.com> on 2015/06/12 17:40:45 UTC, 0 replies.
- [Spark-1.4.0] NoSuchMethodError: com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer - posted by Tao Li <li...@gmail.com> on 2015/06/12 17:40:54 UTC, 1 replies.
- Spark Java API and minimum set of 3rd party dependencies - posted by Elkhan Dadashov <el...@gmail.com> on 2015/06/12 20:14:41 UTC, 2 replies.
- [Spark 1.4.0]How to set driver's system property using spark-submit options? - posted by Peng Cheng <pc...@uow.edu.au> on 2015/06/12 20:17:15 UTC, 4 replies.
- Re: Exception when using CLUSTER BY or ORDER BY - posted by Reynold Xin <rx...@databricks.com> on 2015/06/12 20:23:54 UTC, 0 replies.
- log4j configuration ignored for some classes only - posted by "lomax0000@gmail.com" <lo...@gmail.com> on 2015/06/12 20:28:13 UTC, 0 replies.
- How to use spark for map-reduce flow to filter N columns, top M rows of all csv files under a folder? - posted by Rex X <dn...@gmail.com> on 2015/06/12 20:46:41 UTC, 2 replies.
- [Spark 1.4.0] java.lang.UnsupportedOperationException: Not implemented by the TFS FileSystem implementation - posted by Peter Haumer <ph...@us.ibm.com> on 2015/06/12 20:58:36 UTC, 1 replies.
- --jars not working? - posted by Jonathan Coveney <jc...@gmail.com> on 2015/06/12 21:13:55 UTC, 2 replies.
- Re: Optimizing Streaming from Websphere MQ - posted by Akhil Das <ak...@sigmoidanalytics.com> on 2015/06/12 21:39:39 UTC, 3 replies.
- Parquet Multiple Output - posted by Xin Liu <li...@gmail.com> on 2015/06/12 23:31:05 UTC, 1 replies.
- Extracting k-means cluster values along with centers? - posted by Minnow Noir <mi...@gmail.com> on 2015/06/13 00:44:19 UTC, 1 replies.
- Dynamic allocator requests -1 executors - posted by Patrick Woody <pa...@gmail.com> on 2015/06/13 01:42:03 UTC, 3 replies.
- Resource allocation configurations for Spark on Yarn - posted by Jim Green <op...@gmail.com> on 2015/06/13 01:42:44 UTC, 0 replies.
- Spark SQL and Skewed Joins - posted by Jon Walton <jo...@gmail.com> on 2015/06/13 02:15:14 UTC, 6 replies.
- Reliable SQS Receiver for Spark Streaming - posted by Michal Čizmazia <mi...@gmail.com> on 2015/06/13 02:44:19 UTC, 3 replies.
- [Spark] What is the most efficient way to do such a join and column manipulation? - posted by Rex X <dn...@gmail.com> on 2015/06/13 03:46:37 UTC, 3 replies.
- Are there ways to restrict what parameters users can set for a Spark job? - posted by YaoPau <jo...@gmail.com> on 2015/06/13 07:26:30 UTC, 1 replies.
- Re: Building scaladoc using "build/sbt unidoc" failure - posted by Reynold Xin <rx...@databricks.com> on 2015/06/13 08:17:36 UTC, 0 replies.
- Not albe to run FP-growth Example - posted by masoom alam <ma...@wanclouds.net> on 2015/06/13 11:10:27 UTC, 6 replies.
- How to split log data into different files according to severity - posted by Hao Wang <bi...@gmail.com> on 2015/06/13 11:41:22 UTC, 4 replies.
- How to silence Parquet logging? - posted by Chris Freeman <cf...@alteryx.com> on 2015/06/13 18:29:34 UTC, 2 replies.
- --packages & Failed to load class for data source v1.4 - posted by Don Drake <do...@gmail.com> on 2015/06/13 18:46:11 UTC, 3 replies.
- How to read avro in SparkR - posted by Shing Hing Man <ma...@yahoo.com.INVALID> on 2015/06/13 18:54:35 UTC, 3 replies.
- What is most efficient to do a large union and remove duplicates? - posted by Gavin Yue <yu...@gmail.com> on 2015/06/13 19:49:28 UTC, 4 replies.
- Dataframe Write : Tables created with SQLContext must be TEMPORARY. Use a HiveContext instead. - posted by pth001 <Pa...@uni.no> on 2015/06/13 21:36:09 UTC, 3 replies.
- spark javadocs don't work in eclipse/java? - posted by Keith Freeman <8f...@gmail.com> on 2015/06/14 01:42:02 UTC, 0 replies.
- spark stream twitter question .. - posted by Mike Frampton <mi...@hotmail.com> on 2015/06/14 02:08:52 UTC, 0 replies.
- How to set up a Spark Client node? - posted by "MrAsanjar ." <af...@gmail.com> on 2015/06/14 02:47:22 UTC, 2 replies.
- Job marked as killed in spark 1.4 - posted by nizang <ni...@windward.eu> on 2015/06/14 08:39:06 UTC, 3 replies.
- Re: lower&upperBound not working/spark 1.3 - posted by Sujeevan <su...@gmail.com> on 2015/06/14 11:01:08 UTC, 1 replies.
- creation of RDD from a Tree - posted by lisp <li...@gmail.com> on 2015/06/14 16:50:16 UTC, 1 replies.
- What is the right algorithm to do cluster analysis with mixed numeric, categorical, and string value attributes? - posted by Rex X <dn...@gmail.com> on 2015/06/14 19:05:47 UTC, 4 replies.
- Spark SQL JDBC Source Join Error - posted by Sathish Kumaran Vairavelu <vs...@gmail.com> on 2015/06/14 20:57:52 UTC, 2 replies.
- DataFrame and JDBC regression? - posted by Peter Haumer <ph...@us.ibm.com> on 2015/06/14 23:20:48 UTC, 1 replies.
- Spark SQL - Complex query pushdown - posted by Sathish Kumaran Vairavelu <vs...@gmail.com> on 2015/06/15 01:39:42 UTC, 0 replies.
- Union of two Diff. types of DStreams - posted by anshu shukla <an...@gmail.com> on 2015/06/15 07:29:49 UTC, 1 replies.
- Spark DataFrame Reduce Job Took 40s for 6000 Rows - posted by Proust GZ Feng <pf...@cn.ibm.com> on 2015/06/15 07:57:25 UTC, 8 replies.
- [SparkStreaming] NPE in DStreamCheckPointData.scala:125 - posted by Haopu Wang <HW...@qilinsoft.com> on 2015/06/15 09:36:06 UTC, 1 replies.
- Worker is KILLED for no reason - posted by nizang <ni...@windward.eu> on 2015/06/15 11:03:22 UTC, 1 replies.
- Re: UNRESOLVED DEPENDENCIES while building Spark 1.3.0 - posted by François Garillot <fr...@typesafe.com> on 2015/06/15 14:45:01 UTC, 0 replies.
- Using queueStream - posted by anshu shukla <an...@gmail.com> on 2015/06/15 15:37:22 UTC, 0 replies.
- tasks won't run on mesos when using fine grained - posted by Gary Ogden <go...@gmail.com> on 2015/06/15 15:39:16 UTC, 2 replies.
- settings from props file seem to be ignored in mesos - posted by Gary Ogden <go...@gmail.com> on 2015/06/15 15:44:13 UTC, 2 replies.
- Running spark1.4 inside intellij idea HttpServletResponse - ClassNotFoundException - posted by Wwh 吴 <ww...@hotmail.com> on 2015/06/15 15:52:11 UTC, 1 replies.
- sql.catalyst.ScalaReflection scala.reflect.internal.MissingRequirementError - posted by patcharee <Pa...@uni.no> on 2015/06/15 15:55:30 UTC, 0 replies.
- *Metrics API is odd in MLLib - posted by Sam <sa...@gmail.com> on 2015/06/15 16:13:23 UTC, 2 replies.
- number of partitions in join: Spark documentation misleading! - posted by mrm <ma...@skimlinks.com> on 2015/06/15 17:00:19 UTC, 1 replies.
- Not getting event logs >= spark 1.3.1 - posted by Tsai Li Ming <ma...@ltsai.com> on 2015/06/15 17:26:42 UTC, 1 replies.
- spark sql and cassandra. spark generate 769 tasks to read 3 lines from cassandra table - posted by Serega Sheypak <se...@gmail.com> on 2015/06/15 18:26:20 UTC, 4 replies.
- Re: How can I use Tachyon with SPARK? - posted by Himanshu Mehra <hi...@gmail.com> on 2015/06/15 18:27:34 UTC, 0 replies.
- Error using spark 1.3.0 with maven - posted by Ritesh Kumar Singh <ri...@gmail.com> on 2015/06/15 19:16:52 UTC, 0 replies.
- DataFrame insertIntoJDBC parallelism while writing data into a DB table - posted by Mohammad Tariq <do...@gmail.com> on 2015/06/15 19:20:51 UTC, 2 replies.
- Does spark performance really scale out with multiple machines? - posted by "Wang, Ningjun (LNG-NPV)" <ni...@lexisnexis.com> on 2015/06/15 19:29:58 UTC, 2 replies.
- akka configuration not found - posted by Ritesh Kumar Singh <ri...@gmail.com> on 2015/06/15 19:53:18 UTC, 0 replies.
- Spark job throwing “java.lang.OutOfMemoryError: GC overhead limit exceeded” - posted by diplomatic Guru <di...@gmail.com> on 2015/06/15 21:09:48 UTC, 1 replies.
- Problem: Custom Receiver for getting events from a Dynamic Queue - posted by anshu shukla <an...@gmail.com> on 2015/06/15 21:23:15 UTC, 0 replies.
- missing part of the file while using newHadoopApi - posted by "igor.berman" <ig...@gmail.com> on 2015/06/15 22:08:24 UTC, 0 replies.
- Re: Spark application in production without HDFS - posted by nsalian <ne...@gmail.com> on 2015/06/15 23:32:28 UTC, 1 replies.
- Re: flatmapping with other data - posted by dizzy5112 <da...@gmail.com> on 2015/06/16 03:25:02 UTC, 0 replies.
- Creating RDD from Iterable from groupByKey results - posted by Nirav Patel <np...@xactlycorp.com> on 2015/06/16 05:44:12 UTC, 1 replies.
- Spark DataFrame 1.4 write to parquet/saveAsTable tasks fail - posted by Night Wolf <ni...@gmail.com> on 2015/06/16 05:47:39 UTC, 5 replies.
- How does one decide no of executors/cores/memory allocation? - posted by shreesh <sh...@gmail.com> on 2015/06/16 05:57:37 UTC, 5 replies.
- ALS predictALL not completing - posted by afarahat <ay...@yahoo.com> on 2015/06/16 06:20:07 UTC, 3 replies.
- Help!!!Map or join one large datasets then suddenly remote Akka client disassociated - posted by Jia Yu <ji...@asu.edu> on 2015/06/16 07:38:07 UTC, 0 replies.
- Re: RDD of Iterable[String] - posted by nir <ni...@gmail.com> on 2015/06/16 07:58:55 UTC, 0 replies.
- Re: What are the likely causes of org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle? - posted by Jia Yu <ji...@asu.edu> on 2015/06/16 08:48:20 UTC, 3 replies.
- Spark Configuration of spark.worker.cleanup.appDataTtl - posted by lu...@sina.com on 2015/06/16 08:52:23 UTC, 1 replies.
- Spark 1.4 DataFrame Parquet file writing - missing random rows/partitions - posted by Nathan McCarthy <Na...@quantium.com.au> on 2015/06/16 09:30:54 UTC, 3 replies.
- Spark+hive bucketing - posted by Marcin Szymaniuk <ma...@gmail.com> on 2015/06/16 09:56:19 UTC, 0 replies.
- 回复:Re: Spark Configuration of spark.worker.cleanup.appDataTtl - posted by lu...@sina.com on 2015/06/16 10:02:57 UTC, 0 replies.
- SparkR 1.4.0: read.df() function fails - posted by esten <er...@dnvgl.com> on 2015/06/16 10:55:09 UTC, 4 replies.
- HiveContext saveAsTable create wrong partition - posted by patcharee <Pa...@uni.no> on 2015/06/16 11:14:14 UTC, 4 replies.
- cassandra with jdbcRDD - posted by Hafiz Mujadid <ha...@gmail.com> on 2015/06/16 13:01:46 UTC, 1 replies.
- how to maintain the offset for spark streaming if HDFS is the source - posted by Manohar753 <ma...@happiestminds.com> on 2015/06/16 14:25:58 UTC, 1 replies.
- Spark History Server pointing to S3 - posted by Gianluca Privitera <gi...@studio.unibo.it> on 2015/06/16 14:56:22 UTC, 2 replies.
- stop streaming context of job failure - posted by Krot Viacheslav <kr...@gmail.com> on 2015/06/16 15:35:16 UTC, 1 replies.
- The problem when share data inside Dstream - posted by Shuai Zhang <sm...@yahoo.com.INVALID> on 2015/06/16 17:15:43 UTC, 0 replies.
- Unit Testing Spark Transformations/Actions - posted by Mark Tse <Ma...@D2L.com> on 2015/06/16 18:19:40 UTC, 0 replies.
- HDFS not supported by databricks cloud :-( - posted by Sanjay Subramanian <sa...@yahoo.com.INVALID> on 2015/06/16 18:58:52 UTC, 1 replies.
- [SparkScore] Performance portal for Apache Spark - posted by "Huang, Jie" <ji...@intel.com> on 2015/06/16 19:27:18 UTC, 1 replies.
- Spark on EMR - posted by kamatsuoka <ke...@gmail.com> on 2015/06/16 21:29:36 UTC, 5 replies.
- Pyspark Dense Matrix Multiply : One of them can fit in Memory - posted by afarahat <ay...@yahoo.com> on 2015/06/16 21:30:53 UTC, 0 replies.
- spark-sql CLI options does not work --master yarn --deploy-mode client - posted by Sanjay Subramanian <sa...@yahoo.com.INVALID> on 2015/06/16 23:24:28 UTC, 0 replies.
- Re: FW: MLLIB (Spark) Question. - posted by DB Tsai <db...@dbtsai.com> on 2015/06/16 23:49:18 UTC, 1 replies.
- Suggestions for Posting on the User Mailing List - posted by nsalian <ne...@gmail.com> on 2015/06/17 01:02:59 UTC, 0 replies.
- What happens when a streaming consumer job is killed then restarted? - posted by dgoldenberg <dg...@gmail.com> on 2015/06/17 01:06:38 UTC, 0 replies.
- What is Spark's data retention policy? - posted by dgoldenberg <dg...@gmail.com> on 2015/06/17 01:10:30 UTC, 0 replies.
- Custom Spark metrics? - posted by dgoldenberg <dg...@gmail.com> on 2015/06/17 01:13:47 UTC, 0 replies.
- Re: How to use DataFrame with MySQL - posted by matthewrj <ml...@gmail.com> on 2015/06/17 01:50:34 UTC, 0 replies.
- Submitting Spark Applications using Spark Submit - posted by raggy <ra...@gmail.com> on 2015/06/17 01:53:08 UTC, 13 replies.
- Unable to use more than 1 executor for spark streaming application with YARN - posted by Saiph Kappa <sa...@gmail.com> on 2015/06/17 02:17:24 UTC, 1 replies.
- ClassNotFound exception from closure - posted by Yana Kadiyska <ya...@gmail.com> on 2015/06/17 03:07:08 UTC, 1 replies.
- Spark or Storm - posted by as...@gmail.com on 2015/06/17 03:45:53 UTC, 42 replies.
- questions on the "waiting batches" and "scheduling delay" in Streaming UI - posted by "Fang, Mike" <ch...@paypal.com.INVALID> on 2015/06/17 04:13:56 UTC, 0 replies.
- Incorrect ACL checking for partitioned table in Spark SQL-1.4 - posted by Karthik Subramanian <ka...@philips.com> on 2015/06/17 06:40:46 UTC, 0 replies.
- Re: IPv6 support - posted by Kevin Liu <ke...@fb.com> on 2015/06/17 07:27:10 UTC, 7 replies.
- Read/write metrics for jobs which use S3 - posted by Abhishek Modi <ab...@gmail.com> on 2015/06/17 08:04:59 UTC, 0 replies.
- Interpreting what gets printed as one submits spark application - posted by shreesh <sh...@gmail.com> on 2015/06/17 08:05:04 UTC, 0 replies.
- Shuffle produces one huge partition - posted by Al M <al...@gmail.com> on 2015/06/17 08:45:49 UTC, 1 replies.
- Kerberos authentication exception when spark access hbase with yarn-cluster mode on a kerberos yarn Cluster - posted by 马元文 <ma...@qiyi.com> on 2015/06/17 08:52:55 UTC, 0 replies.
- Can it works in load the MatrixFactorizationModel and predict product with Spark Streaming? - posted by wanbo <ge...@163.com> on 2015/06/17 10:26:13 UTC, 1 replies.
- Implementing and Using a Custom Actor-based Receiver - posted by anshu shukla <an...@gmail.com> on 2015/06/17 11:07:04 UTC, 0 replies.
- Issues building 1.4.0 using make-distribution - posted by Tsai Li Ming <ma...@ltsai.com> on 2015/06/17 11:15:33 UTC, 0 replies.
- Spark Shell Hive Context and Kerberos ticket - posted by Olivier Girardot <o....@lateral-thoughts.com> on 2015/06/17 11:37:17 UTC, 2 replies.
- Documentation for external shuffle service in 1.4.0 - posted by Tsai Li Ming <ma...@ltsai.com> on 2015/06/17 11:41:34 UTC, 0 replies.
- Intermedate stage will be cached automatically ? - posted by canan chen <cc...@gmail.com> on 2015/06/17 11:56:44 UTC, 4 replies.
- spark-sql estimates Cassandra table with 3 rows as 8 TB of data - posted by Serega Sheypak <se...@gmail.com> on 2015/06/17 12:10:58 UTC, 0 replies.
- generateTreeString causes huge performance problems on dataframe persistence - posted by Jan-Paul Bultmann <ja...@me.com> on 2015/06/17 12:17:10 UTC, 2 replies.
- Re: Spark updateStateByKey fails with class leak when using case classes - resend - posted by rsearle <eg...@verizon.net> on 2015/06/17 13:21:55 UTC, 0 replies.
- Job with spark - posted by Sergio Jiménez Barrio <dr...@gmail.com> on 2015/06/17 14:24:40 UTC, 0 replies.
- Twitter Heron: Stream Processing at Scale - Does Spark Address all the issues - posted by Ashish Soni <as...@gmail.com> on 2015/06/17 14:47:55 UTC, 0 replies.
- Is HiveContext Thread Safe? - posted by V Dineshkumar <de...@gmail.com> on 2015/06/17 15:43:41 UTC, 1 replies.
- Using spark.hadoop.* to set Hadoop properties - posted by Corey Nolet <cj...@gmail.com> on 2015/06/17 16:03:57 UTC, 0 replies.
- Re: Loading lots of parquet files into dataframe from s3 - posted by arnonrgo <ar...@rgoarchitects.com> on 2015/06/17 16:57:04 UTC, 1 replies.
- Web UI vs History Server Bugs - posted by jcai <jo...@yale.edu> on 2015/06/17 20:10:51 UTC, 5 replies.
- Re: Can we increase the space of spark standalone cluster - posted by maxdml <ma...@cs.duke.edu> on 2015/06/17 20:37:34 UTC, 1 replies.
- Serial batching with Spark Streaming - posted by Michal Čizmazia <mi...@gmail.com> on 2015/06/17 21:22:32 UTC, 9 replies.
- Hive query execution from Spark(through HiveContext) failing with Apache Sentry - posted by Nitin kak <ni...@gmail.com> on 2015/06/17 21:47:35 UTC, 7 replies.
- Shuffled vs non-shuffled coalesce in Apache Spark - posted by "pawel.jurczenko" <pa...@gmail.com> on 2015/06/17 22:30:13 UTC, 0 replies.
- Re: MLLib: instance weight - posted by Xiangrui Meng <me...@databricks.com> on 2015/06/17 23:54:53 UTC, 0 replies.
- Executor memory allocations - posted by Corey Nolet <cj...@gmail.com> on 2015/06/18 00:02:37 UTC, 1 replies.
- Re: Parallel parameter tuning: distributed execution of MLlib algorithms - posted by Xiangrui Meng <me...@gmail.com> on 2015/06/18 00:58:38 UTC, 0 replies.
- Re: Collabrative Filtering - posted by Xiangrui Meng <me...@gmail.com> on 2015/06/18 01:03:18 UTC, 0 replies.
- Re: Inconsistent behavior with Dataframe Timestamp between 1.3.1 and 1.4.0 - posted by Xiangrui Meng <me...@gmail.com> on 2015/06/18 01:06:41 UTC, 1 replies.
- Re: spark mlib variance analysis - posted by Xiangrui Meng <me...@gmail.com> on 2015/06/18 01:10:52 UTC, 0 replies.
- Re: Parallel parameter tuning: distributed execution of MLlib algorithms - posted by Peter Rudenko <pe...@gmail.com> on 2015/06/18 01:22:20 UTC, 0 replies.
- Reading maprfs from Spark - posted by Bikrant Neupane <bi...@gmail.com> on 2015/06/18 01:25:21 UTC, 1 replies.
- deployment options for Spark and YARN w/ many app jar library dependencies - posted by "Sweeney, Matt" <ms...@fourv.com> on 2015/06/18 01:47:05 UTC, 2 replies.
- Where are my log4j or exception output on EMR? - posted by Sean Bollin <se...@sean-bollin.com> on 2015/06/18 02:24:47 UTC, 1 replies.
- Is there programmatic way running Spark job on Yarn cluster without using spark-submit script ? - posted by Elkhan Dadashov <el...@gmail.com> on 2015/06/18 02:29:40 UTC, 4 replies.
- Union of many RDDs taking a long time - posted by Matt Forbes <mf...@twitter.com.INVALID> on 2015/06/18 02:53:19 UTC, 1 replies.
- Spark SQL DATE_ADD function - Spark 1.3.1 & 1.4.0 - posted by Nathan McCarthy <Na...@quantium.com.au> on 2015/06/18 03:05:41 UTC, 0 replies.
- understanding on the "waiting batches" and "scheduling delay" in Streaming UI - posted by Mike Fang <ch...@gmail.com> on 2015/06/18 03:59:20 UTC, 4 replies.
- Issue with PySpark UDF on a column of Vectors - posted by Colin Alstad <co...@gmail.com> on 2015/06/18 04:47:54 UTC, 2 replies.
- Why does driver transfer application jar to executors? - posted by Shiyao Ma <i...@introo.me> on 2015/06/18 04:48:19 UTC, 2 replies.
- Iterative Programming by keeping data across micro-batches in spark-streaming? - posted by Nipun Arora <ni...@gmail.com> on 2015/06/18 04:51:28 UTC, 2 replies.
- Re: - posted by Silvio Fiorito <si...@granturing.com> on 2015/06/18 04:52:00 UTC, 22 replies.
- Matrix Multiplication and mllib.recommendation - posted by afarahat <ay...@yahoo.com> on 2015/06/18 05:15:30 UTC, 14 replies.
- Pyspark combination - posted by bhavyateja <bh...@gmail.com> on 2015/06/18 06:10:25 UTC, 0 replies.
- Pyspark RDD search - posted by bhavyateja <bh...@gmail.com> on 2015/06/18 06:16:27 UTC, 1 replies.
- Re: Shuffle produces one huge partition and many tiny partitions - posted by Al M <al...@gmail.com> on 2015/06/18 09:59:25 UTC, 4 replies.
- benchmark my application on hadoop cluster - posted by Pa Rö <pa...@googlemail.com> on 2015/06/18 10:59:16 UTC, 0 replies.
- Machine Learning on GraphX - posted by texol <t....@gmail.com> on 2015/06/18 11:03:49 UTC, 4 replies.
- Fwd: mllib from sparkR - posted by Elena Scardovi <es...@bitbang.com> on 2015/06/18 12:11:06 UTC, 2 replies.
- [Spark Streaming] Iterative programming on an ordered spark stream using Java? - posted by Nipun Arora <ni...@gmail.com> on 2015/06/18 12:36:44 UTC, 3 replies.
- connect mobile app with Spark backend - posted by Ralph Bergmann <ra...@dasralph.de> on 2015/06/18 13:03:30 UTC, 1 replies.
- Got the exception when joining RDD with spark streamRDD - posted by Groupme <gr...@gmail.com> on 2015/06/18 13:25:02 UTC, 1 replies.
- Accumulators / Accumulables : thread-local, task-local, executor-local ? - posted by Guillaume Pitel <gu...@exensa.com> on 2015/06/18 13:36:06 UTC, 7 replies.
- Best way to randomly distribute elements - posted by abellet <au...@telecom-paristech.fr> on 2015/06/18 13:37:13 UTC, 4 replies.
- kafka spark streaming working example - posted by Bartek Radziszewski <ba...@scalaric.com> on 2015/06/18 14:29:34 UTC, 1 replies.
- Spark and Google Cloud Storage - posted by Klaus Schaefers <kl...@ligatus.com> on 2015/06/18 15:31:27 UTC, 1 replies.
- Re: Spark-sql(yarn-client) java.lang.NoClassDefFoundError: org/apache/spark/deploy/yarn/ExecutorLauncher - posted by Yin Huai <yh...@databricks.com> on 2015/06/18 17:20:34 UTC, 1 replies.
- [Spark Streaming] Runtime Error in call to max function for JavaPairRDD - posted by Nipun Arora <ni...@gmail.com> on 2015/06/18 17:44:10 UTC, 5 replies.
- problem with pants building - posted by peixin li <bl...@gmail.com> on 2015/06/18 18:17:00 UTC, 1 replies.
- different schemas per row with DataFrames - posted by Alex Nastetsky <al...@vervemobile.com> on 2015/06/18 18:22:34 UTC, 0 replies.
- Hivecontext going out-of-sync issue - posted by Ranadip Chatterjee <ra...@gmail.com> on 2015/06/18 19:22:22 UTC, 0 replies.
- Spark-sql versus Impala versus Hive - posted by Sanjay Subramanian <sa...@yahoo.com.INVALID> on 2015/06/18 20:08:36 UTC, 3 replies.
- Latency between the RDD in Streaming - posted by anshu shukla <an...@gmail.com> on 2015/06/18 20:24:53 UTC, 6 replies.
- Specify number of partitions with which to run DataFrame.join? - posted by Matt Cheah <mc...@palantir.com> on 2015/06/18 20:49:27 UTC, 0 replies.
- Settings for K-Means Clustering in Mlib for large data set - posted by Rogers Jeffrey <ro...@gmail.com> on 2015/06/18 21:22:48 UTC, 4 replies.
- MLIB-KMEANS: Py4JNetworkError: An error occurred while trying to connect to the Java server , on a huge data set - posted by rogersjeffreyl <ro...@gmail.com> on 2015/06/18 21:36:51 UTC, 0 replies.
- The "Initial job has not accepted any resources" error; can't seem to set - posted by dgoldenberg <dg...@gmail.com> on 2015/06/18 22:47:14 UTC, 1 replies.
- confusing ScalaReflectionException with DataFrames in 1.4 - posted by Chad Urso McDaniel <ch...@gmail.com> on 2015/06/18 23:56:11 UTC, 3 replies.
- Interaction between StringIndexer feature transformer and CrossValidator - posted by cyz <zh...@gmail.com> on 2015/06/19 02:18:19 UTC, 0 replies.
- [SparkSQL]. MissingRequirementError when creating dataframe from RDD (new error in 1.4) - posted by Adam Lewandowski <ad...@gmail.com> on 2015/06/19 02:35:13 UTC, 1 replies.
- Coalescing with shuffle = false in imbalanced cluster - posted by Corey Nolet <cj...@gmail.com> on 2015/06/19 03:03:23 UTC, 0 replies.
- NaiveBayes for MLPipeline is absent - posted by Justin Yip <yi...@prediction.io> on 2015/06/19 03:35:17 UTC, 2 replies.
- createDirectStream and Stats - posted by Tim Smith <se...@gmail.com> on 2015/06/19 04:01:03 UTC, 14 replies.
- Build spark application into uber jar - posted by "bit1129@163.com" <bi...@163.com> on 2015/06/19 04:40:58 UTC, 7 replies.
- how to change /tmp folder for spark ut use sbt - posted by "yuemeng (A)" <yu...@huawei.com> on 2015/06/19 04:58:23 UTC, 1 replies.
- SparkSubmit with Ivy jars is very slow to load with no internet access - posted by Nathan McCarthy <Na...@quantium.com.au> on 2015/06/19 06:53:08 UTC, 1 replies.
- Re: Error when connecting to Spark SQL via Hive JDBC driver - posted by ogoh <ok...@gmail.com> on 2015/06/19 07:41:19 UTC, 1 replies.
- Spark group by sub coulumn - posted by Suraj Shetiya <su...@gmail.com> on 2015/06/19 09:02:23 UTC, 1 replies.
- Re: Header in each output files. - posted by rahulkumar-aws <ra...@gmail.com> on 2015/06/19 09:26:24 UTC, 0 replies.
- N kafka topics vs N spark Streaming - posted by Manohar753 <ma...@happiestminds.com> on 2015/06/19 10:56:52 UTC, 1 replies.
- Code review - Spark SQL command-line client for Cassandra - posted by Matthew Johnson <ma...@algomi.com> on 2015/06/19 11:20:07 UTC, 11 replies.
- Abount Jobs UI on yarn-client mode - posted by Sea <26...@qq.com> on 2015/06/19 13:37:28 UTC, 0 replies.
- Spark 1.4 on HortonWork HDP 2.2 - posted by Ashish Soni <as...@gmail.com> on 2015/06/19 14:22:27 UTC, 5 replies.
- ERROR in withColumn method - posted by Animesh Baranawal <an...@gmail.com> on 2015/06/19 14:50:33 UTC, 1 replies.
- SparkR - issue when starting the sparkR shell - posted by "Kulkarni, Vikram" <vi...@hp.com> on 2015/06/19 14:53:24 UTC, 1 replies.
- [ERROR] Insufficient Space - posted by Vadim Bichutskiy <va...@gmail.com> on 2015/06/19 16:15:22 UTC, 4 replies.
- Cassandra - Spark 1.3 - reading data from cassandra table with PYSpark - posted by Koen Vantomme <ko...@gmail.com> on 2015/06/19 16:33:45 UTC, 1 replies.
- Abount Jobs UI in yarn-client mode - posted by Sea <26...@qq.com> on 2015/06/19 17:48:55 UTC, 3 replies.
- What files/folders/jars spark-submit script depend on ? - posted by Elkhan Dadashov <el...@gmail.com> on 2015/06/19 19:12:47 UTC, 2 replies.
- What is needed to integrate Spark with Pandas and scikit-learn? - posted by YaoPau <jo...@gmail.com> on 2015/06/19 19:23:04 UTC, 0 replies.
- Spark Streaming 1.3.0 ERROR LiveListenerBus - posted by Evo Eftimov <ev...@isecc.com> on 2015/06/19 19:31:09 UTC, 0 replies.
- SparkSQL: leftOuterJoin is VERY slow! - posted by Piero Cinquegrana <pc...@marketshare.com> on 2015/06/19 19:48:06 UTC, 2 replies.
- Spark FP-Growth algorithm for frequent sequential patterns - posted by ping yan <sh...@gmail.com> on 2015/06/19 19:51:27 UTC, 2 replies.
- Difference between Lasso regression in MLlib package and ML package - posted by Wei Zhou <zh...@gmail.com> on 2015/06/19 20:38:20 UTC, 5 replies.
- Failed stages and dropped executors when running implicit matrix factorization/ALS - posted by Ravi Mody <rm...@gmail.com> on 2015/06/19 20:43:35 UTC, 11 replies.
- Spark on Yarn - How to configure - posted by Ashish Soni <as...@gmail.com> on 2015/06/19 21:35:38 UTC, 1 replies.
- Missing values support in Mllib yet? - posted by Arun Luthra <ar...@gmail.com> on 2015/06/19 22:23:15 UTC, 1 replies.
- Un-persist RDD in a loop - posted by afarahat <ay...@yahoo.com> on 2015/06/19 22:28:41 UTC, 1 replies.
- Assigning number of workers in spark streaming - posted by anshu shukla <an...@gmail.com> on 2015/06/19 23:29:17 UTC, 3 replies.
- PySpark on YARN "port out of range" - posted by John Meehan <me...@dls.net> on 2015/06/19 23:57:24 UTC, 2 replies.
- Verifying number of workers in Spark Streaming - posted by anshu shukla <an...@gmail.com> on 2015/06/20 11:40:51 UTC, 2 replies.
- Local spark jars not being detected - posted by Ritesh Kumar Singh <ri...@gmail.com> on 2015/06/20 14:38:37 UTC, 3 replies.
- How to get the ALS reconstruction error - posted by afarahat <ay...@yahoo.com> on 2015/06/20 15:25:34 UTC, 0 replies.
- Spark SQL JDBC Source data skew - posted by Sathish Kumaran Vairavelu <vs...@gmail.com> on 2015/06/20 15:47:49 UTC, 1 replies.
- Velox Model Server - posted by Debasish Das <de...@gmail.com> on 2015/06/20 16:51:50 UTC, 16 replies.
- Spark 1.4 History Server - HDP 2.2 - posted by Ashish Soni <as...@gmail.com> on 2015/06/20 18:37:03 UTC, 1 replies.
- Load slf4j from the job assembly instead of from the Spark jar - posted by Mario Pastorelli <ma...@teralytics.ch> on 2015/06/21 01:41:05 UTC, 0 replies.
- Grouping elements in a RDD - posted by Brandon White <bw...@gmail.com> on 2015/06/21 01:48:10 UTC, 1 replies.
- How could output the StreamingLinearRegressionWithSGD prediction result? - posted by Gavin Yue <yu...@gmail.com> on 2015/06/21 06:23:20 UTC, 1 replies.
- Task Serialization Error on DataFrame.foreachPartition - posted by Nishant Patel <ni...@gmail.com> on 2015/06/21 07:10:16 UTC, 2 replies.
- Driver and Executor on the same machine - posted by DStrip <d....@hotmail.com> on 2015/06/21 08:50:09 UTC, 0 replies.
- Fwd: Java Constructor Issues - posted by Shaanan Cohney <sh...@gmail.com> on 2015/06/21 16:24:24 UTC, 1 replies.
- Spark Titan - posted by Madabhattula Rajesh Kumar <mr...@gmail.com> on 2015/06/21 17:20:51 UTC, 2 replies.
- Fwd: How to get and parse whole xml file in HDFS by Spark Streaming - posted by Yong Feng <fe...@gmail.com> on 2015/06/21 20:53:17 UTC, 4 replies.
- s3 - Can't make directory for path - posted by nizang <ni...@windward.eu> on 2015/06/21 22:52:24 UTC, 3 replies.
- Updation of Static variable inside foreachRDD method - posted by anshu shukla <an...@gmail.com> on 2015/06/22 00:09:53 UTC, 0 replies.
- Using Accumulators in Streaming - posted by anshu shukla <an...@gmail.com> on 2015/06/22 01:30:58 UTC, 5 replies.
- Spark 1.4.0 SQL JDBC "partition stride"? - posted by Keith Freeman <8f...@gmail.com> on 2015/06/22 05:04:28 UTC, 0 replies.
- Problem attaching to YARN - posted by Shawn Garbett <sh...@gmail.com> on 2015/06/22 05:08:56 UTC, 1 replies.
- PartitionBy/Partitioner for dataFrames? - posted by Tom Hubregtsen <th...@gmail.com> on 2015/06/22 05:59:39 UTC, 0 replies.
- Reducer memory usage - posted by Corey Nolet <cj...@gmail.com> on 2015/06/22 06:24:58 UTC, 0 replies.
- Re: Abount Jobs UI in yarn-client mode - posted by Sea <26...@qq.com> on 2015/06/22 06:52:44 UTC, 0 replies.
- How to use an different version of hive - posted by Sea <26...@qq.com> on 2015/06/22 07:13:36 UTC, 0 replies.
- memory needed for each executor - posted by pth001 <Pa...@uni.no> on 2015/06/22 07:27:45 UTC, 1 replies.
- [Spark 1.3.1 SQL] Using Hive - posted by Mike Frampton <mi...@hotmail.com> on 2015/06/22 07:33:06 UTC, 0 replies.
- JavaDStream read and write rdbms - posted by Manohar753 <ma...@happiestminds.com> on 2015/06/22 09:09:20 UTC, 2 replies.
- Spark 1.3 - Connect to to Cassandra - cassandraTable is not recognised by sc - posted by Koen Vantomme <ko...@gmail.com> on 2015/06/22 10:46:00 UTC, 1 replies.
- [Spark Streaming 1.4.0] SPARK-5063, Checkpointing and queuestream - posted by Shaanan Cohney <sh...@gmail.com> on 2015/06/22 12:47:13 UTC, 7 replies.
- Confusion matrix for binary classification - posted by CD Athuraliya <cd...@gmail.com> on 2015/06/22 13:21:49 UTC, 2 replies.
- Serializer not switching - posted by Sean Barzilay <se...@gmail.com> on 2015/06/22 14:07:42 UTC, 3 replies.
- Spark and HDFS ( Worker and Data Nodes Combination ) - posted by Ashish Soni <as...@gmail.com> on 2015/06/22 14:29:19 UTC, 3 replies.
- Re: Custom Metrics Sink - posted by dgoldenberg <dg...@gmail.com> on 2015/06/22 15:56:05 UTC, 0 replies.
- Re: Registering custom metrics - posted by dgoldenberg <dg...@gmail.com> on 2015/06/22 15:57:03 UTC, 4 replies.
- jars are not loading from 1.3. those set via setJars to the SparkContext - posted by Murthy Chelankuri <km...@gmail.com> on 2015/06/22 16:54:06 UTC, 10 replies.
- Calling rdd() on a DataFrame causes stage boundary - posted by Alex Nastetsky <al...@vervemobile.com> on 2015/06/22 17:12:20 UTC, 0 replies.
- Help optimising Spark SQL query - posted by James Aley <ja...@swiftkey.com> on 2015/06/22 17:28:52 UTC, 10 replies.
- Yarn application ID for Spark job on Yarn - posted by roy <rp...@njit.edu> on 2015/06/22 17:45:26 UTC, 1 replies.
- Re: Does HiveContext connect to HiveServer2? - posted by nitinkak001 <ni...@gmail.com> on 2015/06/22 18:13:30 UTC, 1 replies.
- Support for Windowing and Analytics functions in Spark SQL - posted by Sourav Mazumder <so...@gmail.com> on 2015/06/22 18:58:32 UTC, 2 replies.
- Re: GSSException when submitting Spark job in yarn-cluster mode with HiveContext APIs on Kerberos cluster - posted by Olivier Girardot <ss...@gmail.com> on 2015/06/22 19:01:49 UTC, 0 replies.
- Multiple executors writing file using java filewriter - posted by anshu shukla <an...@gmail.com> on 2015/06/22 19:20:29 UTC, 6 replies.
- Why can't I allocate more than 4 executors with 2 machines on YARN? - posted by Saiph Kappa <sa...@gmail.com> on 2015/06/22 20:10:52 UTC, 2 replies.
- SQL vs. DataFrame API - posted by Bob Corsaro <rc...@gmail.com> on 2015/06/22 21:26:12 UTC, 10 replies.
- External Jar file with SparkR - posted by mtn111 <se...@gmail.com> on 2015/06/22 21:59:35 UTC, 0 replies.
- Spark job fails silently - posted by roy <rp...@njit.edu> on 2015/06/22 23:10:18 UTC, 1 replies.
- workaround for groupByKey - posted by Jianguo Li <fl...@gmail.com> on 2015/06/22 23:12:30 UTC, 7 replies.
- spark on yarn failing silently - posted by roy <rp...@njit.edu> on 2015/06/22 23:14:15 UTC, 0 replies.
- New Spark Meetup group in Munich - posted by Danny Linden <ko...@dannylinden.de> on 2015/06/23 01:00:20 UTC, 0 replies.
- which mllib algorithm for large multi-class classification? - posted by Danny <ko...@dannylinden.de> on 2015/06/23 01:21:21 UTC, 2 replies.
- Fwd: Storing an action result in HDFS - posted by ravi tella <dd...@gmail.com> on 2015/06/23 02:28:10 UTC, 3 replies.
- Question about SPARK_WORKER_CORES and spark.task.cpus - posted by Rui Li <sp...@gmail.com> on 2015/06/23 02:56:13 UTC, 1 replies.
- Programming with java on spark - posted by 付雅丹 <ya...@gmail.com> on 2015/06/23 03:28:01 UTC, 2 replies.
- mutable vs. pure functional implementation - StatCounter - posted by mzeltser <mz...@gmail.com> on 2015/06/23 04:25:30 UTC, 1 replies.
- Any way to retrieve time of message arrival to Kafka topic, in Spark Streaming? - posted by dgoldenberg <dg...@gmail.com> on 2015/06/23 06:52:05 UTC, 2 replies.
- MLLIB - Storing the Trained Model - posted by samsudhin <sa...@pigstick.com> on 2015/06/23 08:14:05 UTC, 1 replies.
- Spark standalone cluster - resource management - posted by nizang <ni...@windward.eu> on 2015/06/23 08:18:28 UTC, 4 replies.
- What does [Stage 0:> (0 + 2) / 2] mean on the console - posted by "bit1129@163.com" <bi...@163.com> on 2015/06/23 09:51:28 UTC, 2 replies.
- How to figure out how many records received by individual receiver - posted by "bit1129@163.com" <bi...@163.com> on 2015/06/23 10:30:17 UTC, 0 replies.
- How to disable parquet schema merging in 1.4? - posted by Rex Xiong <by...@gmail.com> on 2015/06/23 11:20:56 UTC, 0 replies.
- Calculating tuple count /input rate with time - posted by anshu shukla <an...@gmail.com> on 2015/06/23 11:49:19 UTC, 1 replies.
- Spark Streaming: limit number of nodes - posted by Wojciech Pituła <w....@gmail.com> on 2015/06/23 13:37:41 UTC, 5 replies.
- Spark launching without all of the requested YARN resources - posted by Arun Luthra <ar...@gmail.com> on 2015/06/23 16:34:53 UTC, 3 replies.
- java.lang.IllegalArgumentException: A metric named ... already exists - posted by Juan Rodríguez Hortalá <ju...@gmail.com> on 2015/06/23 16:59:29 UTC, 1 replies.
- [Spark Streaming] Null Pointer Exception when accessing broadcast variable to store a hashmap in Java - posted by Nipun Arora <ni...@gmail.com> on 2015/06/23 17:01:40 UTC, 5 replies.
- Should I keep memory dedicated for HDFS and Spark on cluster nodes? - posted by maxdml <ma...@cs.duke.edu> on 2015/06/23 17:56:19 UTC, 1 replies.
- org.apache.spark.sql.ScalaReflectionLock - posted by Koert Kuipers <ko...@tresata.com> on 2015/06/23 18:34:52 UTC, 1 replies.
- Limitations using SparkContext - posted by daunnc <da...@gmail.com> on 2015/06/23 18:44:36 UTC, 1 replies.
- SPARK-8566 - posted by Eric Friedman <er...@gmail.com> on 2015/06/23 19:09:24 UTC, 0 replies.
- Can Spark1.4 work with CDH4.6 - posted by Yana Kadiyska <ya...@gmail.com> on 2015/06/23 20:37:57 UTC, 3 replies.
- spark streaming with kafka jar missing - posted by Shushant Arora <sh...@gmail.com> on 2015/06/23 20:56:16 UTC, 4 replies.
- When to use underlying data management layer versus standalone Spark? - posted by commtech <mi...@opco.com> on 2015/06/23 21:46:15 UTC, 3 replies.
- kafka spark streaming with mesos - posted by Bartek Radziszewski <ba...@scalaric.com> on 2015/06/23 22:05:12 UTC, 1 replies.
- Kafka createDirectStream ​issue - posted by syepes <sy...@gmail.com> on 2015/06/23 22:48:03 UTC, 5 replies.
- How Spark Execute chaining vs no chaining statements - posted by Ashish Soni <as...@gmail.com> on 2015/06/23 23:17:31 UTC, 1 replies.
- RE: Nested DataFrame(SchemaRDD) - posted by Richard Catlin <ri...@gmail.com> on 2015/06/24 01:12:55 UTC, 3 replies.
- map V mapPartitions - posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com on 2015/06/24 01:57:32 UTC, 2 replies.
- flume sinks supported by spark streaming - posted by Hafiz Mujadid <ha...@gmail.com> on 2015/06/24 05:46:44 UTC, 1 replies.
- when cached RDD will unpersist its data - posted by "bit1129@163.com" <bi...@163.com> on 2015/06/24 07:22:37 UTC, 1 replies.
- How to use KryoSerializer : ClassNotFoundException - posted by pth001 <Pa...@uni.no> on 2015/06/24 09:52:21 UTC, 0 replies.
- how to create custom data source? - posted by 诺铁 <no...@gmail.com> on 2015/06/24 10:39:06 UTC, 0 replies.
- Killing Long running tasks (stragglers) - posted by William Ferrell <wf...@gmail.com> on 2015/06/24 11:02:59 UTC, 2 replies.
- Question - writing data to Cassandra to Spark gives a strange error message - posted by Koen Vantomme <ko...@gmail.com> on 2015/06/24 11:58:49 UTC, 0 replies.
- Parquet problems - posted by Anders Arpteg <ar...@spotify.com> on 2015/06/24 13:10:26 UTC, 2 replies.
- How to extract complex JSON structures using Apache Spark 1.4.0 Data Frames - posted by Gustavo Arjones <ga...@socialmetrix.com> on 2015/06/24 14:27:21 UTC, 2 replies.
- Spark stream test throw org.apache.spark.SparkException: Task not serializable when execute in spark shell - posted by "yuemeng (A)" <yu...@huawei.com> on 2015/06/24 14:41:19 UTC, 1 replies.
- Loss of data due to congestion - posted by anshu shukla <an...@gmail.com> on 2015/06/24 15:18:35 UTC, 3 replies.
- WorkFlow Processing - Spark - posted by Ashish Soni <as...@gmail.com> on 2015/06/24 17:32:36 UTC, 3 replies.
- java.lang.OutOfMemoryError: PermGen space - posted by stati <sr...@gmail.com> on 2015/06/24 17:57:45 UTC, 3 replies.
- dateTime functionality - posted by hbutani <rh...@gmail.com> on 2015/06/24 18:36:05 UTC, 0 replies.
- how to increase parallelism ? - posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com on 2015/06/24 19:57:25 UTC, 4 replies.
- Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.elapsedMillis()J - posted by maxdml <ma...@cs.duke.edu> on 2015/06/24 20:03:42 UTC, 3 replies.
- com.esotericsoftware.kryo.KryoException: java.io.IOException: failed to read chunk - posted by Piero Cinquegrana <pc...@marketshare.com> on 2015/06/24 21:09:01 UTC, 2 replies.
- Compiling Spark 1.4 (and/or Spark 1.4.1-rc1) with CDH 5.4.1/2 - posted by Aaron <aa...@gmail.com> on 2015/06/24 21:15:21 UTC, 4 replies.
- Spark Python process - posted by Justin Steigel <js...@gmail.com> on 2015/06/24 21:30:46 UTC, 0 replies.
- Spark SQL incompatible with Apache Sentry(Cloudera bundle) - posted by nitinkak001 <ni...@gmail.com> on 2015/06/24 21:37:29 UTC, 0 replies.
- Understanding accumulator during transformations - posted by Wei Zhou <zh...@gmail.com> on 2015/06/24 22:08:41 UTC, 4 replies.
- [sparksql] sparse floating point data compression in sparksql cache - posted by Nikita Dolgov <ni...@beckon.com> on 2015/06/24 22:31:11 UTC, 1 replies.
- Re: EOFException using KryoSerializer - posted by Jim Carroll <ji...@gmail.com> on 2015/06/24 23:15:42 UTC, 0 replies.
- How to Map and Reduce in sparkR - posted by Wei Zhou <zh...@gmail.com> on 2015/06/24 23:59:17 UTC, 5 replies.
- How to run kmeans.py Spark example in yarn-cluster ? - posted by Elkhan Dadashov <el...@gmail.com> on 2015/06/25 00:13:06 UTC, 1 replies.
- HiveContext /Spark much slower than Hive - posted by afarahat <ay...@yahoo.com> on 2015/06/25 00:51:58 UTC, 0 replies.
- Aggregating metrics using Cassandra and Spark streaming - posted by Mike Trienis <mi...@orcsol.com> on 2015/06/25 00:58:27 UTC, 0 replies.
- Spark ec2 cluster lost worker - posted by anny9699 <an...@gmail.com> on 2015/06/25 02:58:54 UTC, 3 replies.
- Nesting DataFrames and saving to Parquet - posted by Richard Catlin <ri...@gmail.com> on 2015/06/25 03:36:05 UTC, 0 replies.
- bugs in Spark PageRank implementation - posted by "Kelly, Terence P (HP Labs Researcher)" <te...@hp.com> on 2015/06/25 06:36:41 UTC, 2 replies.
- Debugging Apache Spark clustered application from Eclipse - posted by nitinkalra2000 <ni...@gmail.com> on 2015/06/25 07:17:37 UTC, 1 replies.
- SparkR parallelize not found with 1.4.1? - posted by Felix C <fe...@hotmail.com> on 2015/06/25 07:24:12 UTC, 3 replies.
- Parsing a tsv file with key value pairs - posted by Ravikant Dindokar <ra...@gmail.com> on 2015/06/25 07:30:57 UTC, 3 replies.
- Killing Long running tasks (stragglers)? - posted by wasauce <wf...@gmail.com> on 2015/06/25 08:30:44 UTC, 0 replies.
- Akka failures: Driver Disassociated - posted by barmaley <ol...@solver.com> on 2015/06/25 08:38:21 UTC, 1 replies.
- Spark Language / Data Base Question - posted by "Sinha, Ujjawal (SFO-MAP)" <Uj...@cadreon.com> on 2015/06/25 09:02:16 UTC, 1 replies.
- spark1.4 sparkR usage - posted by "1106944911@qq.com" <11...@qq.com> on 2015/06/25 09:09:43 UTC, 0 replies.
- JDBCRDD sync with mssql - posted by Manohar753 <ma...@happiestminds.com> on 2015/06/25 09:11:34 UTC, 0 replies.
- Re: spark1.4 sparkR usage - posted by Akhil Das <ak...@sigmoidanalytics.com> on 2015/06/25 10:55:24 UTC, 5 replies.
- map vs mapPartitions - posted by Shushant Arora <sh...@gmail.com> on 2015/06/25 11:16:26 UTC, 9 replies.
- Re: Problem with version compatibility - posted by Sean Owen <so...@cloudera.com> on 2015/06/25 11:17:16 UTC, 0 replies.
- How to create correct data frame for classification in Spark ML? - posted by dusan <dg...@smileymedia.com> on 2015/06/25 13:04:16 UTC, 0 replies.
- Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded - posted by Roman Sokolov <ol...@gmail.com> on 2015/06/25 15:20:48 UTC, 5 replies.
- Spark Meetup Istanbul - posted by Şafak Serdar Kapçı <ss...@gmail.com> on 2015/06/25 16:16:53 UTC, 2 replies.
- Can I access the Decision Tree Output - posted by "Dempsey, Robert" <Ro...@5one.com> on 2015/06/25 17:46:18 UTC, 0 replies.
- Problem Run Spark Example HBase Code Using Spark-Submit - posted by Bin Wang <bi...@gmail.com> on 2015/06/25 18:01:38 UTC, 2 replies.
- Performing sc.paralleize (..) in workers not in the driver program - posted by shahab <sh...@gmail.com> on 2015/06/25 18:46:00 UTC, 3 replies.
- assign unique ID (Long Value) to each line in RDD - posted by Ravikant Dindokar <ra...@gmail.com> on 2015/06/25 19:25:44 UTC, 1 replies.
- Re: How to get the memory usage infomation of a spark application - posted by maxdml <ma...@cs.duke.edu> on 2015/06/25 19:28:00 UTC, 0 replies.
- Using Spark on Azure Blob Storage - posted by Daniel Haviv <da...@veracity-group.com> on 2015/06/25 19:37:17 UTC, 5 replies.
- Recent spark sc.textFile needs hadoop for folders?!? - posted by Ashic Mahtab <as...@live.com> on 2015/06/25 20:00:24 UTC, 6 replies.
- java.io.NotSerializableException: org.apache.spark.SparkContext - posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com on 2015/06/25 20:11:35 UTC, 1 replies.
- Scala/Python or Java - posted by spark user <sp...@yahoo.com.INVALID> on 2015/06/25 21:04:46 UTC, 5 replies.
- Has anyone run Python Spark application on Yarn-cluster mode ? (which has 3rd party Python modules to be shipped with) - posted by Elkhan Dadashov <el...@gmail.com> on 2015/06/25 21:17:03 UTC, 4 replies.
- Spark 1.4.0, Secure YARN Cluster, Application Master throws 500 connection refused - posted by Nachiketa <na...@gmail.com> on 2015/06/25 21:22:23 UTC, 1 replies.
- Re: Spark 1.4.0, Secure YARN Cluster, Application Master throws 500 connection refused (Resolved) - posted by Nachiketa <na...@gmail.com> on 2015/06/25 22:20:52 UTC, 0 replies.
- sparkR could not find function "textFile" - posted by Wei Zhou <zh...@gmail.com> on 2015/06/25 22:33:40 UTC, 12 replies.
- sql dataframe internal representation - posted by Koert Kuipers <ko...@tresata.com> on 2015/06/25 22:56:05 UTC, 1 replies.
- reduceByKey - add values to a list - posted by Kannappan Sirchabesan <bu...@gmail.com> on 2015/06/26 00:37:47 UTC, 5 replies.
- Executors requested are way less than what i actually got - posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com on 2015/06/26 00:57:08 UTC, 4 replies.
- Spark RDS data insertion - posted by Bill Milan <bi...@gmail.com> on 2015/06/26 01:04:43 UTC, 1 replies.
- Vision old applications in webui with json logs - posted by maxdml <ma...@gmail.com> on 2015/06/26 01:19:25 UTC, 0 replies.
- Spark. Efficiency. toDebugString understanding - posted by Eugene Morozov <fa...@list.ru> on 2015/06/26 02:12:55 UTC, 0 replies.
- RE: Nested DataFrames - posted by Richard Catlin <ri...@gmail.com> on 2015/06/26 02:16:43 UTC, 1 replies.
- Spark1.4.0 compiling error with java1.6.0_20: sun.misc.Unsafe cannot be applied to (java.lang.Object,long,java.lang.Object,long,long) - posted by 胡安扬 <zz...@163.com> on 2015/06/26 03:35:51 UTC, 2 replies.
- Spark 1.4 RDD to DF fails with toDF() - posted by stati <sr...@gmail.com> on 2015/06/26 03:35:54 UTC, 5 replies.
- GraphX - ConnectedComponents (Pregel) - longer and longer interval between jobs - posted by Thomas Gerber <th...@radius.com> on 2015/06/26 04:43:24 UTC, 2 replies.
- Re: Failed to save RDD to text File in windows OS - posted by stati <sr...@gmail.com> on 2015/06/26 05:21:59 UTC, 0 replies.
- SparkSQL - understanding Cross Joins - posted by Night Wolf <ni...@gmail.com> on 2015/06/26 07:28:34 UTC, 0 replies.
- ALS :how to set numUserBlocks and numItemBlocks - posted by afarahat <ay...@yahoo.com> on 2015/06/26 07:42:13 UTC, 0 replies.
- Spark for distributed dbms cluster - posted by "louis.hust" <lo...@gmail.com> on 2015/06/26 08:37:52 UTC, 1 replies.
- how to do table partitioning efficiently? - posted by 诺铁 <no...@gmail.com> on 2015/06/26 08:41:40 UTC, 0 replies.
- Problem after enabling Hadoop native libraries - posted by Arunabha Ghosh <ar...@gmail.com> on 2015/06/26 10:39:44 UTC, 2 replies.
- Time is ugly in Spark Streaming.... - posted by Sea <26...@qq.com> on 2015/06/26 11:06:16 UTC, 2 replies.
- [Spark 1.3.1] Spark HiveQL -> CDH 5.3 Hive 0.13 UDF's - posted by Mike Frampton <mi...@hotmail.com> on 2015/06/26 11:40:43 UTC, 0 replies.
- The usage of OpenBLAS - posted by Tsai Li Ming <ma...@ltsai.com> on 2015/06/26 12:15:07 UTC, 0 replies.
- [SparkScore]Performance portal for Apache Spark - WW26 - posted by "Huang, Jie" <ji...@intel.com> on 2015/06/26 13:24:38 UTC, 4 replies.
- spark streaming with kafka reset offset - posted by Shushant Arora <sh...@gmail.com> on 2015/06/26 13:43:17 UTC, 11 replies.
- Spark driver hangs on start of job - posted by Sjoerd Mulder <sj...@gmail.com> on 2015/06/26 14:28:03 UTC, 2 replies.
- Kafka Direct Stream - Custom Serialization and Deserilization - posted by Ashish Soni <as...@gmail.com> on 2015/06/26 14:39:48 UTC, 3 replies.
- Dependency Injection with Spark Java - posted by Michal Čizmazia <mi...@gmail.com> on 2015/06/26 14:49:44 UTC, 2 replies.
- 回复: Time is ugly in Spark Streaming.... - posted by Sea <26...@qq.com> on 2015/06/26 14:59:47 UTC, 1 replies.
- spark streaming - checkpoint - posted by ram kumar <ra...@gmail.com> on 2015/06/26 15:05:55 UTC, 3 replies.
- Time series data - posted by Caio Cesar Trucolo <tr...@gmail.com> on 2015/06/26 15:07:29 UTC, 1 replies.
- Spark 1.4.0 - Using SparkR on EC2 Instance - posted by RedOakMark <ma...@redoakstrategic.com> on 2015/06/26 15:27:00 UTC, 9 replies.
- How to recover in case user errors in streaming - posted by Amit Assudani <aa...@impetus.com> on 2015/06/26 16:05:00 UTC, 12 replies.
- Re: Master dies after program finishes normally - posted by Yifan LI <ia...@gmail.com> on 2015/06/26 16:22:32 UTC, 1 replies.
- Re: hadoop input/output format advanced control - posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com on 2015/06/26 16:37:43 UTC, 0 replies.
- Re: Kryo serialization of classes in additional jars - posted by patcharee <Pa...@uni.no> on 2015/06/26 17:35:45 UTC, 0 replies.
- Accessing Kerberos Secured HDFS Resources from Spark on Mesos - posted by Dave Ariens <da...@blackberry.com> on 2015/06/26 17:49:01 UTC, 18 replies.
- Multiple dir support : newApiHadoopFile - posted by Bahubali Jain <ba...@gmail.com> on 2015/06/26 18:43:59 UTC, 3 replies.
- spilling in-memory map of 5.1 MB to disk (272 times so far) - posted by "igor.berman" <ig...@gmail.com> on 2015/06/26 19:07:04 UTC, 1 replies.
- Re: Unable to specify multiple directories as input - posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com on 2015/06/26 19:18:09 UTC, 0 replies.
- Spark SQL - Setting YARN Classpath for primordial class loader - posted by Kumaran Mani <ku...@gmail.com> on 2015/06/26 19:29:16 UTC, 0 replies.
- YARN worker out of disk memory - posted by Tarun Garg <bi...@live.com> on 2015/06/26 19:41:20 UTC, 0 replies.
- HOw to concatenate two csv files into one RDD? - posted by Rex X <dn...@gmail.com> on 2015/06/26 20:00:59 UTC, 1 replies.
- Cannot iterate items in rdd.mapPartition() - posted by "Wang, Ningjun (LNG-NPV)" <ni...@lexisnexis.com> on 2015/06/26 20:52:57 UTC, 1 replies.
- spark streaming job fails to restart after checkpointing due to DStream initialization errors - posted by Ashish Nigam <as...@gmail.com> on 2015/06/26 22:45:55 UTC, 3 replies.
- Spark 1.4 - memory bloat in group by/aggregate??? - posted by Manoj Samel <ma...@gmail.com> on 2015/06/26 23:13:44 UTC, 0 replies.
- Unable to start Pi (hello world) application on Spark 1.4 - posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com on 2015/06/26 23:27:03 UTC, 2 replies.
- Re: Failed stages and dropped executors when running implicit matrix factorization/ALS : Too many values to unpack - posted by Ayman Farahat <ay...@yahoo.com.INVALID> on 2015/06/26 23:28:42 UTC, 0 replies.
- dataframe left joins are not working as expected in pyspark - posted by Axel Dahl <ax...@whisperstream.com> on 2015/06/27 05:00:45 UTC, 7 replies.
- Uncaught exception in thread delete Spark local dirs - posted by Guillermo Ortiz <ko...@gmail.com> on 2015/06/27 09:35:18 UTC, 8 replies.
- JavaRDD and saveAsNewAPIHadoopFile() - posted by Bahubali Jain <ba...@gmail.com> on 2015/06/27 12:42:28 UTC, 1 replies.
- R "on spark" - posted by Evo Eftimov <ev...@isecc.com> on 2015/06/27 13:33:08 UTC, 0 replies.
- 回复: Uncaught exception in thread delete Spark local dirs - posted by Sea <26...@qq.com> on 2015/06/27 16:34:13 UTC, 0 replies.
- How to timeout a task? - posted by wasauce <wf...@gmail.com> on 2015/06/27 17:33:33 UTC, 1 replies.
- rdd.saveAsSequenceFile(path) - posted by Pat Ferrel <pa...@occamsmachete.com> on 2015/06/27 23:46:13 UTC, 0 replies.
- Spark-Submit / Spark-Shell Error Standalone cluster - posted by Ashish Soni <as...@gmail.com> on 2015/06/28 04:53:14 UTC, 5 replies.
- Failed stages and dropped executors when running implicit matrix factorization/ALS : Same error after the re-partition - posted by Ayman Farahat <ay...@yahoo.com.INVALID> on 2015/06/28 05:16:37 UTC, 0 replies.
- Re: Failed stages and dropped executors when running implicit matrix factorization/ALS : Same error after the re-partition - posted by Sabarish Sasidharan <sa...@manthan.com> on 2015/06/28 05:50:13 UTC, 4 replies.
- problem for submitting job - posted by 郭谦 <bu...@gmail.com> on 2015/06/28 09:56:51 UTC, 2 replies.
- required: org.apache.spark.streaming.dstream.DStream[org.apache.spark.mllib.linalg.Vector] - posted by Arthur Chan <ar...@gmail.com> on 2015/06/28 14:49:07 UTC, 4 replies.
- What does "Spark is not just MapReduce" mean? Isn't every Spark job a form of MapReduce? - posted by YaoPau <jo...@gmail.com> on 2015/06/28 18:13:18 UTC, 3 replies.
- Use logback instead of log4j in a Spark job - posted by Mario Pastorelli <ma...@teralytics.ch> on 2015/06/28 19:10:18 UTC, 0 replies.
- spark-submit in deployment mode with the "--jars" option - posted by hishamm <hi...@unige.ch> on 2015/06/28 19:44:14 UTC, 1 replies.
- Share your cluster & run details - posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com on 2015/06/28 20:59:03 UTC, 0 replies.
- Re: What does "Spark is not just MapReduce" mean? Isn't every Spark job a form of MapReduce? - posted by Koert Kuipers <ko...@tresata.com> on 2015/06/28 22:44:15 UTC, 1 replies.
- Fine control with sc.sequenceFile - posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com on 2015/06/29 06:23:29 UTC, 5 replies.
- Spark SQL parallel query submission via single HiveContext - posted by V Dineshkumar <de...@gmail.com> on 2015/06/29 08:20:45 UTC, 0 replies.
- How to find how many cores are allocated to Executor - posted by "bit1129@163.com" <bi...@163.com> on 2015/06/29 10:29:44 UTC, 2 replies.
- SparkContext and JavaSparkContext - posted by Hao Ren <in...@gmail.com> on 2015/06/29 11:15:03 UTC, 1 replies.
- Re: kmeans broadcast - posted by Himanshu Mehra <hi...@gmail.com> on 2015/06/29 11:30:42 UTC, 0 replies.
- got "java.lang.reflect.UndeclaredThrowableException" when running multiply APPs in spark - posted by lu...@sina.com on 2015/06/29 11:38:39 UTC, 1 replies.
- Re: Scala problem when using g.vertices.map "not a member of type parameter" - posted by Robineast <Ro...@xense.co.uk> on 2015/06/29 12:12:49 UTC, 0 replies.
- Spark Streaming-Receiver drops data - posted by summerdaway <su...@gmail.com> on 2015/06/29 12:17:35 UTC, 0 replies.
- Serialization Exception - posted by Spark Enthusiast <sp...@yahoo.in> on 2015/06/29 14:14:54 UTC, 1 replies.
- load Java properties file in Spark - posted by diplomatic Guru <di...@gmail.com> on 2015/06/29 14:51:05 UTC, 1 replies.
- Directory creation failed leads to job fail (should it?) - posted by maxdml <ma...@gmail.com> on 2015/06/29 15:04:46 UTC, 4 replies.
- Schema for type is not supported - posted by Sander van Dijk <sg...@gmail.com> on 2015/06/29 16:07:41 UTC, 0 replies.
- Running Spark 1.4.1 without Hadoop - posted by Sourav Mazumder <so...@gmail.com> on 2015/06/29 16:24:35 UTC, 9 replies.
- SPARK REMOTE DEBUG - posted by Pietro Gentile <pi...@gmail.com> on 2015/06/29 17:37:09 UTC, 1 replies.
- Spark shell crumbles after memory is full - posted by hbogert <ha...@gmail.com> on 2015/06/29 17:43:10 UTC, 3 replies.
- [SparkR] Missing Spark APIs in R - posted by Pradeep Bashyal <pr...@bashyal.com> on 2015/06/29 18:40:33 UTC, 3 replies.
- Load Multiple DB Table - Spark SQL - posted by Ashish Soni <as...@gmail.com> on 2015/06/29 19:47:54 UTC, 1 replies.
- SparkSQL built in functions - posted by Bob Corsaro <rc...@gmail.com> on 2015/06/29 20:27:17 UTC, 4 replies.
- s3 bucket access/read file - posted by didi <di...@gmail.com> on 2015/06/29 20:29:13 UTC, 3 replies.
- Applying functions over certain count of tuples . - posted by anshu shukla <an...@gmail.com> on 2015/06/29 21:57:44 UTC, 2 replies.
- is there any significant performance issue converting between rdd and dataframes in pyspark? - posted by Axel Dahl <ax...@whisperstream.com> on 2015/06/29 22:27:16 UTC, 0 replies.
- Job failed but there is no proper reason - posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com on 2015/06/29 23:22:07 UTC, 5 replies.
- Checkpoint support? - posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com on 2015/06/29 23:29:31 UTC, 1 replies.
- Checkpoint FS failure or connectivity issue - posted by Amit Assudani <aa...@impetus.com> on 2015/06/29 23:34:08 UTC, 1 replies.
- breeze.linalg.DenseMatrix not found - posted by AlexG <sw...@gmail.com> on 2015/06/29 23:51:10 UTC, 2 replies.
- Subsecond queries possible? - posted by Eric Pederson <er...@gmail.com> on 2015/06/30 01:01:33 UTC, 11 replies.
- Need clarification on spark on cluster set up instruction - posted by manish ranjan <cs...@gmail.com> on 2015/06/30 01:32:05 UTC, 0 replies.
- spark streaming HDFS file issue - posted by ravi tella <dd...@gmail.com> on 2015/06/30 03:59:50 UTC, 1 replies.
- Shuffle files lifecycle - posted by Thomas Gerber <th...@radius.com> on 2015/06/30 04:12:34 UTC, 3 replies.
- 回复: How to recover in case user errors in streaming - posted by Sea <26...@qq.com> on 2015/06/30 04:23:23 UTC, 0 replies.
- How do i speed up my Spark App - posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com on 2015/06/30 07:11:28 UTC, 0 replies.
- Spark 1.4.0: read.df() causes excessive IO - posted by Exie <tf...@prodevelop.com.au> on 2015/06/30 07:14:45 UTC, 0 replies.
- Error while installing spark - posted by Chintan Bhatt <ch...@charusat.ac.in> on 2015/06/30 07:30:26 UTC, 1 replies.
- Can Dependencies Be Resolved on Spark Cluster? - posted by SLiZn Liu <sl...@gmail.com> on 2015/06/30 07:46:44 UTC, 3 replies.
- Talk on Deep dive into Spark Data source API - posted by madhu phatak <ph...@gmail.com> on 2015/06/30 10:17:01 UTC, 1 replies.
- Explanation of the numbers on Spark Streaming UI - posted by "bit1129@163.com" <bi...@163.com> on 2015/06/30 11:42:57 UTC, 1 replies.
- MLLib- Probabilities with LogisticRegression - posted by Klaus Schaefers <kl...@ligatus.com> on 2015/06/30 13:00:33 UTC, 1 replies.
- DataFrame registerTempTable Concurrent Access - posted by prosp4300 <pr...@163.com> on 2015/06/30 15:41:57 UTC, 0 replies.
- Spark Dataframe 1.4 (GroupBy partial match) - posted by Suraj Shetiya <su...@gmail.com> on 2015/06/30 16:05:53 UTC, 1 replies.
- Difference between spark-defaults.conf and SparkConf.set - posted by Yana Kadiyska <ya...@gmail.com> on 2015/06/30 16:17:59 UTC, 1 replies.
- Spark streaming on standalone cluster - posted by Borja Garrido Bear <ka...@gmail.com> on 2015/06/30 16:59:21 UTC, 1 replies.
- Spark driver using Spark Streaming shows increasing memory/CPU usage - posted by easyonthemayo <ne...@velocityww.com> on 2015/06/30 18:48:24 UTC, 1 replies.
- run reduceByKey on huge data in spark - posted by hotdog <li...@163.com> on 2015/06/30 19:03:26 UTC, 2 replies.
- Dataframes to EdgeRDD (GraphX) using Scala api to Spark - posted by zblanton <ze...@gmail.com> on 2015/06/30 19:24:40 UTC, 0 replies.
- Grouping runs of elements in a RDD - posted by RJ Nowling <rn...@gmail.com> on 2015/06/30 20:01:57 UTC, 4 replies.
- Want to avoid groupByKey as its running for ever - posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com on 2015/06/30 20:29:31 UTC, 2 replies.
- Issues in reading a CSV file from local file system using spark-shell - posted by Sourav Mazumder <so...@gmail.com> on 2015/06/30 21:06:33 UTC, 0 replies.
- Estimating Task memory - posted by Giovanni Paolo Gibilisco <gi...@polimi.it> on 2015/06/30 21:17:58 UTC, 0 replies.
- Running Spark program testing using scalatest and maven: cluster master exception - posted by lagerspetz <ee...@cs.helsinki.fi> on 2015/06/30 22:10:11 UTC, 0 replies.
- Check for null in PySpark DataFrame - posted by pedro <sk...@gmail.com> on 2015/06/30 23:28:11 UTC, 0 replies.
- Retrieve hadoop conf object from Python API - posted by Richard Ding <pi...@gmail.com> on 2015/06/30 23:41:37 UTC, 0 replies.