dev@spark.apache.org, 2016-04

You are viewing a plain text version of this content. The canonical link for it is here.

- Re: Making BatchPythonEvaluation actually Batch - posted by Davies Liu <da...@databricks.com> on 2016/04/01 01:41:32 UTC, 0 replies.
- [discuss] using deep learning to improve Spark - posted by Reynold Xin <rx...@databricks.com> on 2016/04/01 09:15:06 UTC, 4 replies.
- Re: Any documentation on Spark's security model beyond YARN? - posted by Michael Segel <ms...@hotmail.com> on 2016/04/01 14:23:11 UTC, 0 replies.
- Re: how about a custom coalesce() policy? - posted by Nezih Yigitbasi <ny...@netflix.com.INVALID> on 2016/04/01 19:03:24 UTC, 3 replies.
- Declare rest of @Experimental items non-experimental if they've existed since 1.2.0 - posted by Renyi Xiong <re...@gmail.com> on 2016/04/01 19:10:22 UTC, 1 replies.
- Eliminating shuffle write and spill disk IO reads/writes in Spark - posted by Michael Slavitch <sl...@gmail.com> on 2016/04/01 20:32:05 UTC, 14 replies.
- Re: Spark SQL UDF Returning Rows - posted by Michael Armbrust <mi...@databricks.com> on 2016/04/01 22:26:27 UTC, 0 replies.
- Re: What influences the space complexity of Spark operations? - posted by Michael Armbrust <mi...@databricks.com> on 2016/04/01 22:29:11 UTC, 1 replies.
- Re: Discuss: commit to Scala 2.10 support for Spark 2.x lifecycle - posted by Koert Kuipers <ko...@tresata.com> on 2016/04/02 01:10:25 UTC, 10 replies.
- RE: Declare rest of @Experimental items non-experimental if they'veexisted since 1.2.0 - posted by Renyi Xiong <re...@gmail.com> on 2016/04/02 03:59:03 UTC, 0 replies.
- (Unknown) - posted by Avinash Dongre <do...@gmail.com> on 2016/04/03 13:20:06 UTC, 0 replies.
- [SQL] Dataset.map gives error: missing parameter type for expanded function? - posted by Jacek Laskowski <ja...@japila.pl> on 2016/04/03 21:23:05 UTC, 1 replies.
- explain codegen - posted by Ted Yu <yu...@gmail.com> on 2016/04/04 01:00:38 UTC, 7 replies.
- Re: [discuss] ending support for Java 7 in Spark 2.0 - posted by Reynold Xin <rx...@databricks.com> on 2016/04/04 07:28:41 UTC, 8 replies.
- Spark 1.6.1 binary pre-built for Hadoop 2.6 may be broken - posted by Kousuke Saruta <sa...@oss.nttdata.co.jp> on 2016/04/04 14:49:33 UTC, 0 replies.
- RDD Partitions not distributed evenly to executors - posted by Mike Hynes <91...@gmail.com> on 2016/04/04 15:12:13 UTC, 7 replies.
- Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt? - posted by Nicholas Chammas <ni...@gmail.com> on 2016/04/04 15:58:23 UTC, 13 replies.
- Re: running lda in spark throws exception - posted by Joseph Bradley <jo...@databricks.com> on 2016/04/04 20:33:12 UTC, 0 replies.
- error: reference to sql is ambiguous after import org.apache.spark._ in shell? - posted by Jacek Laskowski <ja...@japila.pl> on 2016/04/05 02:16:39 UTC, 1 replies.
- Re: java.lang.OutOfMemoryError: Unable to acquire bytes of memory - posted by Reynold Xin <rx...@databricks.com> on 2016/04/05 03:16:05 UTC, 4 replies.
- Build changes after SPARK-13579 - posted by Marcelo Vanzin <va...@cloudera.com> on 2016/04/05 05:00:42 UTC, 3 replies.
- Build with Thrift Server & Scala 2.11 - posted by Raymond Honderdors <Ra...@sizmek.com> on 2016/04/05 12:44:28 UTC, 5 replies.
- Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0 - posted by Xiangrui Meng <me...@gmail.com> on 2016/04/05 20:01:16 UTC, 10 replies.
- Spark Streaming UI reporting a different task duration - posted by Renyi Xiong <re...@gmail.com> on 2016/04/05 20:02:05 UTC, 0 replies.
- Updating Spark PR builder and 2.x test jobs to use Java 8 JDK - posted by Josh Rosen <jo...@databricks.com> on 2016/04/05 23:14:00 UTC, 3 replies.
- [STREAMING] DStreamClosureSuite.scala with { return; ssc.sparkContext.emptyRDD[Int] } Why?! - posted by Jacek Laskowski <ja...@japila.pl> on 2016/04/06 00:40:22 UTC, 2 replies.
- BROKEN BUILD? Is this only me or not? - posted by Jacek Laskowski <ja...@japila.pl> on 2016/04/06 01:51:02 UTC, 3 replies.
- ClassCastException when extracting and collecting DF array column type - posted by Nick Pentreath <ni...@gmail.com> on 2016/04/06 13:35:43 UTC, 1 replies.
- Big Data Interview FAQ - posted by Chaturvedi Chola <ch...@gmail.com> on 2016/04/06 15:22:21 UTC, 0 replies.
- Agenda Announced for Spark Summit 2016 in San Francisco - posted by Scott walent <sc...@gmail.com> on 2016/04/06 18:44:29 UTC, 0 replies.
- Executor shutdown hooks? - posted by Sung Hwan Chung <co...@gmail.com> on 2016/04/06 21:24:18 UTC, 4 replies.
- Possible bug related to [SPARK-5708] - posted by Mihir Monani <mm...@salesforce.com> on 2016/04/07 09:31:39 UTC, 0 replies.
- Delegation of Kerberos tokens - posted by Wojciech Indyk <wo...@gmail.com> on 2016/04/08 11:01:30 UTC, 1 replies.
- [build system] taking amp-jenkins-worker-06 and -07 offline due to disk space issues - posted by shane knapp <sk...@berkeley.edu> on 2016/04/08 20:18:41 UTC, 1 replies.
- How Spark handles dead machines during a job. - posted by Sung Hwan Chung <co...@gmail.com> on 2016/04/09 06:19:18 UTC, 1 replies.
- [BUILD FAILURE] Spark Project ML Local Library - me or it's real? - posted by Jacek Laskowski <ja...@japila.pl> on 2016/04/09 21:01:43 UTC, 3 replies.
- spark graphx storage RDD memory leak - posted by zhang juntao <ju...@gmail.com> on 2016/04/10 18:08:59 UTC, 4 replies.
- Spark Sql on large number of files (~500Megs each) fails after couple of hours - posted by Yash Sharma <ya...@gmail.com> on 2016/04/11 04:46:11 UTC, 3 replies.
- Spark Jenkins test configurations - posted by cherry_zhang <ch...@intel.com> on 2016/04/11 04:54:24 UTC, 0 replies.
- Different maxBins value for categorical and continuous features in RandomForest implementation. - posted by Rahul Tanwani <ta...@gmail.com> on 2016/04/11 11:06:59 UTC, 2 replies.
- Re: [Streaming] textFileStream has no events shown in web UI - posted by Yogesh Mahajan <ym...@snappydata.io> on 2016/04/11 20:10:56 UTC, 0 replies.
- Possible deadlock in registering applications in the recovery mode - posted by Niranda Perera <ni...@gmail.com> on 2016/04/12 09:16:19 UTC, 4 replies.
- SparkSQL - Limit pushdown on BroadcastHashJoin - posted by Rajesh Balamohan <ra...@gmail.com> on 2016/04/12 14:32:02 UTC, 9 replies.
- Spark on Mesos 0.28 issue - posted by Yang Lei <ge...@gmail.com> on 2016/04/12 23:05:29 UTC, 3 replies.
- Re: Spark 1.6.1 packages on S3 corrupt? - posted by Nicholas Chammas <ni...@gmail.com> on 2016/04/13 03:05:09 UTC, 0 replies.
- Accessing Secure Hadoop from Mesos cluster - posted by Tony Kinsley <tk...@gmail.com> on 2016/04/13 06:57:33 UTC, 1 replies.
- jdbc/save DataFrameWriter implementation change - posted by "Justin.Pihony" <ju...@gmail.com> on 2016/04/13 07:51:38 UTC, 0 replies.
- Code freeze? - posted by Sean Owen <so...@cloudera.com> on 2016/04/13 09:45:21 UTC, 3 replies.
- Should localProperties be inheritable? Should we change that or document it? - posted by Marcin Tustin <mt...@handybook.com> on 2016/04/13 15:15:31 UTC, 2 replies.
- DynamoDB data source questions - posted by Travis Crawford <tr...@gmail.com> on 2016/04/13 16:45:35 UTC, 2 replies.
- Dataset.explain, ExplainCommand and sqlContext.executePlan twice? - posted by Jacek Laskowski <ja...@japila.pl> on 2016/04/14 01:43:36 UTC, 0 replies.
- Coding style question (about extra anonymous closure within functional transformations) - posted by Hyukjin Kwon <gu...@gmail.com> on 2016/04/14 04:46:55 UTC, 4 replies.
- [Streaming] Infinite delay when stopping the context - posted by Sergio Ramírez <sr...@ugr.es> on 2016/04/14 12:18:50 UTC, 0 replies.
- Organizing Spark ML example packages - posted by Nick Pentreath <ni...@gmail.com> on 2016/04/14 14:28:34 UTC, 3 replies.
- BytesToBytes and unaligned memory - posted by Adam Roberts <AR...@uk.ibm.com> on 2016/04/15 16:01:09 UTC, 7 replies.
- Unable to access Resource Manager /Name Node on port 9026 / 9101 on a Spark EMR Cluster - posted by Chadha Pooja <Ch...@bcg.com> on 2016/04/15 16:29:04 UTC, 2 replies.
- Skipping Type Conversion and using InternalRows for UDF - posted by Hamel Kothari <ha...@gmail.com> on 2016/04/15 17:44:53 UTC, 1 replies.
- Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends - posted by Luciano Resende <lu...@gmail.com> on 2016/04/15 18:01:55 UTC, 21 replies.
- ClassFormatError in latest spark 2 SNAPSHOT build - posted by Koert Kuipers <ko...@tresata.com> on 2016/04/15 18:38:18 UTC, 1 replies.
- Will not store rdd_16_4383 as it would require dropping another block from the same RDD - posted by Alexander Pivovarov <ap...@gmail.com> on 2016/04/15 21:40:01 UTC, 0 replies.
- Using local-cluster mode for testing Spark-related projects - posted by Evan Chan <ve...@gmail.com> on 2016/04/17 02:47:17 UTC, 4 replies.
- Question about Scala style, explicit typing within transformation functions and anonymous val. - posted by Hyukjin Kwon <gu...@gmail.com> on 2016/04/17 08:06:09 UTC, 5 replies.
- Impact of STW GC events for the driver JVM on overall cluster - posted by Rahul Tanwani <ta...@gmail.com> on 2016/04/17 19:27:49 UTC, 2 replies.
- Recent Jenkins always fails in specific two tests - posted by Kazuaki Ishizaki <IS...@jp.ibm.com> on 2016/04/17 20:26:23 UTC, 2 replies.
- [build system] issue w/jenkins - posted by shane knapp <sk...@berkeley.edu> on 2016/04/18 19:02:28 UTC, 2 replies.
- Implicit from ProcessingTime to scala.concurrent.duration.Duration? - posted by Jacek Laskowski <ja...@japila.pl> on 2016/04/18 19:23:30 UTC, 3 replies.
- More elaborate toString for StreamExecution? - posted by Jacek Laskowski <ja...@japila.pl> on 2016/04/18 20:05:22 UTC, 0 replies.
- auto closing pull requests that have been inactive > 30 days? - posted by Reynold Xin <rx...@databricks.com> on 2016/04/18 21:02:15 UTC, 20 replies.
- inter spark application communication - posted by Soumitra Johri <so...@gmail.com> on 2016/04/18 21:04:54 UTC, 1 replies.
- more uniform exception handling? - posted by Reynold Xin <rx...@databricks.com> on 2016/04/18 21:16:45 UTC, 4 replies.
- YARN Shuffle service and its compatibility - posted by Mark Grover <ma...@apache.org> on 2016/04/18 22:51:01 UTC, 14 replies.
- Re: [spark.ml] Why is private class ColumnPruner? - posted by Yanbo Liang <yb...@gmail.com> on 2016/04/19 08:55:57 UTC, 1 replies.
- Introduction to Spark workshop, May 9, New York - posted by Rich Bowen <rb...@apache.org> on 2016/04/19 15:16:09 UTC, 0 replies.
- Question about storage memory in unified memory manager - posted by Patrick Woody <pa...@gmail.com> on 2016/04/19 15:32:37 UTC, 1 replies.
- RFC: Remote "HBaseTest" from examples? - posted by Marcelo Vanzin <va...@cloudera.com> on 2016/04/19 19:15:31 UTC, 5 replies.
- Re: RFC: Remove "HBaseTest" from examples? - posted by Ted Yu <yu...@gmail.com> on 2016/04/19 19:20:09 UTC, 18 replies.
- Spark sql and hive into different result with same sql - posted by FangFang Chen <lu...@163.com> on 2016/04/20 12:06:06 UTC, 0 replies.
- 回复：Spark sql and hive into different result with same sql - posted by FangFang Chen <lu...@163.com> on 2016/04/20 12:45:28 UTC, 0 replies.
- 回复：回复：Spark sql and hive into different result with same sql - posted by FangFang Chen <lu...@163.com> on 2016/04/20 14:25:59 UTC, 0 replies.
- Improving system design logging in spark - posted by atootoonchian <al...@levyx.com> on 2016/04/20 19:45:48 UTC, 3 replies.
- 回复：Re: 回复：Spark sql and hive into different result with same sql - posted by FangFang Chen <lu...@163.com> on 2016/04/21 08:34:44 UTC, 0 replies.
- [Spark-SQL] Reduce Shuffle Data by pushing filter toward storage - posted by atootoonchian <al...@levyx.com> on 2016/04/21 19:29:53 UTC, 5 replies.
- [GRAPHX] Graph Algorithms and Spark - posted by tgensol <th...@gmail.com> on 2016/04/21 20:47:18 UTC, 5 replies.
- Re: executor delay in Spark - posted by Mike Hynes <91...@gmail.com> on 2016/04/23 01:40:30 UTC, 2 replies.
- Proposal of closing some PRs which at least one of committers suggested so - posted by Hyukjin Kwon <gu...@gmail.com> on 2016/04/23 05:56:54 UTC, 2 replies.
- Spark streaming Kafka receiver WriteAheadLog question - posted by Renyi Xiong <re...@gmail.com> on 2016/04/23 06:49:53 UTC, 2 replies.
- Help us teach Spark and grow the Spark community - posted by "Anthony D. Joseph" <ad...@eecs.berkeley.edu> on 2016/04/23 07:18:53 UTC, 0 replies.
- Spark 2.0 vs 1.6.1 Query Time using Thrift server - posted by Raymond Honderdors <Ra...@sizmek.com> on 2016/04/24 15:50:57 UTC, 1 replies.
- net.razorvine.pickle.PickleException in Pyspark - posted by Caique Marques <ca...@gmail.com> on 2016/04/24 18:39:43 UTC, 1 replies.
- Spark sql with large sql syntax job failed with outofmemory error and grows beyond 64k warn - posted by FangFang Chen <lu...@163.com> on 2016/04/25 06:49:48 UTC, 0 replies.
- Do transformation functions on RDD invoke a Job [sc.runJob]? - posted by Praveen Devarao <pr...@in.ibm.com> on 2016/04/25 07:54:19 UTC, 4 replies.
- Cache Shuffle Based Operation Before Sort - posted by Ali Tootoonchian <al...@levyx.com> on 2016/04/25 21:50:27 UTC, 1 replies.
- [build system] short downtime wednesday morning (4-27-16), 7-9am - posted by shane knapp <sk...@berkeley.edu> on 2016/04/26 01:50:56 UTC, 2 replies.
- Re: spark git commit: [HOTFIX] Fix compilation - posted by Jacek Laskowski <ja...@japila.pl> on 2016/04/26 06:32:11 UTC, 0 replies.
- Re: spark git commit: [HOTFIX] Fix the problem for real this time. - posted by Jacek Laskowski <ja...@japila.pl> on 2016/04/26 06:58:37 UTC, 0 replies.
- Number of partitions for binaryFiles - posted by "Ulanov, Alexander" <al...@hpe.com> on 2016/04/26 20:10:38 UTC, 4 replies.
- HDFS as Shuffle Service - posted by Michael Gummelt <mg...@mesosphere.io> on 2016/04/26 20:20:15 UTC, 15 replies.
- Decrease shuffle in TreeAggregate with coalesce ? - posted by Guillaume Pitel <gu...@exensa.com> on 2016/04/27 13:46:53 UTC, 3 replies.
- Duplicated fit into TrainValidationSplit - posted by Dirceu Semighini Filho <di...@gmail.com> on 2016/04/27 14:29:19 UTC, 2 replies.
- Error running spark-sql-perf version 0.3.2 against Spark 1.6 - posted by Michael Slavitch <sl...@gmail.com> on 2016/04/27 20:34:16 UTC, 0 replies.
- RDD.broadcast - posted by Io...@nomura.com on 2016/04/28 11:33:03 UTC, 6 replies.
- certification suite? - posted by William Benton <wi...@redhat.com> on 2016/04/28 16:08:36 UTC, 0 replies.
- Spark streaming concurrent job scheduling question - posted by Renyi Xiong <re...@gmail.com> on 2016/04/28 18:17:09 UTC, 0 replies.
- Using Spark when data definitions are unknowable at compile time - posted by _na <ni...@seeq.com> on 2016/04/28 18:34:17 UTC, 1 replies.
- Unsubscribe - posted by "Varanasi, Venkata" <ve...@bankofamerica.com> on 2016/04/28 20:35:04 UTC, 0 replies.
- spark 2 segfault - posted by Koert Kuipers <ko...@tresata.com> on 2016/04/28 20:35:06 UTC, 1 replies.
- Re: Spark ML - Scaling logistic regression for many features - posted by Daniel Siegmann <da...@teamaol.com> on 2016/04/28 21:06:42 UTC, 0 replies.
- Re: Tungsten off heap memory access for C++ libraries - posted by "jpivarski@gmail.com" <jp...@gmail.com> on 2016/04/28 21:56:26 UTC, 1 replies.
- SparkR unit test failures on local master - posted by Gayathri Murali <ga...@gmail.com> on 2016/04/28 22:19:42 UTC, 2 replies.
- ConvertToSafe being done before functions.explode - posted by Hamel Kothari <ha...@gmail.com> on 2016/04/28 22:57:22 UTC, 0 replies.
- Unit test error - posted by JaeSung Jun <ja...@gmail.com> on 2016/04/29 06:38:24 UTC, 0 replies.
- unsubscribe - posted by Rob Turner <ro...@gmail.com> on 2016/04/29 14:36:56 UTC, 0 replies.
- Requesting feedback for PR for SPARK-11962 - posted by Arun Allamsetty <ar...@gmail.com> on 2016/04/29 21:00:37 UTC, 0 replies.
- [build system] short downtime monday morning (5-2-16), 7-9am PDT - posted by shane knapp <sk...@berkeley.edu> on 2016/04/29 21:52:44 UTC, 0 replies.
- RPC Timeout and Abnormally Long JvmGcTime - posted by Wes Holler <wh...@algebraixdata.com> on 2016/04/30 01:09:17 UTC, 1 replies.