dev@spark.apache.org, 2016-03

You are viewing a plain text version of this content. The canonical link for it is here.

- Re: What should be spark.local.dir in spark on yarn? - posted by Jeff Zhang <zj...@gmail.com> on 2016/03/01 00:44:53 UTC, 2 replies.
- Support virtualenv in PySpark - posted by Jeff Zhang <zj...@gmail.com> on 2016/03/01 06:07:20 UTC, 1 replies.
- Spark performance comparison for research - posted by yasincelik <ya...@gmail.com> on 2016/03/01 06:25:54 UTC, 2 replies.
- Re: Is spark.driver.maxResultSize used correctly ? - posted by Reynold Xin <rx...@databricks.com> on 2016/03/01 10:00:52 UTC, 1 replies.
- SPARK-SQL: Pattern Detection on Live Event or Archived Event Data - posted by Jerry Lam <ch...@gmail.com> on 2016/03/01 15:16:18 UTC, 12 replies.
- Re: [Proposal] Enabling time series analysis on spark metrics - posted by Karan Kumar <ka...@gmail.com> on 2016/03/01 17:17:13 UTC, 2 replies.
- Dataframe Partitioning - posted by Teng Liao <tl...@palantir.com> on 2016/03/02 00:19:26 UTC, 3 replies.
- HashedRelation Memory Pressure on Broadcast Joins - posted by Matt Cheah <mc...@palantir.com> on 2016/03/02 02:17:43 UTC, 5 replies.
- Selecting column in dataframe created with incompatible schema causes AnalysisException - posted by Ewan Leith <ew...@realitymine.com> on 2016/03/02 10:44:26 UTC, 1 replies.
- Re: Upgrading to Kafka 0.9.x - posted by Cody Koeninger <co...@koeninger.org> on 2016/03/02 20:17:23 UTC, 0 replies.
- [VOTE] Release Apache Spark 1.6.1 (RC1) - posted by Michael Armbrust <mi...@databricks.com> on 2016/03/02 23:45:09 UTC, 15 replies.
- About the exception "Received LaunchTask command but executor was null" - posted by Sea <26...@qq.com> on 2016/03/03 03:44:24 UTC, 0 replies.
- getting a list of executors for use in getPreferredLocations - posted by Cody Koeninger <co...@koeninger.org> on 2016/03/04 00:08:42 UTC, 3 replies.
- Fwd: spark master ui to proxy app and worker ui - posted by Gurvinder Singh <gu...@uninett.no> on 2016/03/04 09:25:44 UTC, 1 replies.
- Set up a Coverity scan for Spark - posted by Sean Owen <so...@cloudera.com> on 2016/03/04 11:34:54 UTC, 6 replies.
- Re: Mapper side join with DataFrames API - posted by Deepak Gopalakrishnan <dg...@gmail.com> on 2016/03/04 13:02:42 UTC, 1 replies.
- GraphX optimizations - posted by Khaled Ammar <kh...@gmail.com> on 2016/03/04 18:53:30 UTC, 1 replies.
- Use cases for kafka direct stream messageHandler - posted by Cody Koeninger <co...@koeninger.org> on 2016/03/04 22:39:14 UTC, 3 replies.
- Fwd: Spark SQL drops the HIVE table in "overwrite" mode while writing into table - posted by Dhaval Modi <dh...@gmail.com> on 2016/03/05 16:02:10 UTC, 2 replies.
- Typo in community databricks cloud docs - posted by Eugene Morozov <ev...@gmail.com> on 2016/03/06 01:23:22 UTC, 1 replies.
- Spark Custom Partitioner not picked - posted by Prabhu Joseph <pr...@gmail.com> on 2016/03/06 21:48:05 UTC, 0 replies.
- PySpark, spill-related (possibly psutil) issue, throwing an exception '_fill_function() takes exactly 4 arguments (5 given)' - posted by Hyukjin Kwon <gu...@gmail.com> on 2016/03/07 03:19:45 UTC, 2 replies.
- Nulls getting converted to 0 with spark 2.0 SNAPSHOT - posted by Franklyn D'souza <fr...@shopify.com> on 2016/03/07 20:30:01 UTC, 1 replies.
- Adding hive context gives error - posted by Suniti Singh <su...@gmail.com> on 2016/03/08 01:15:37 UTC, 1 replies.
- Dynamic allocation availability on standalone mode. Misleading doc. - posted by Eugene Morozov <ev...@gmail.com> on 2016/03/08 01:25:11 UTC, 3 replies.
- Does anyone implement org.apache.spark.serializer.Serializer in their own code? - posted by Josh Rosen <jo...@databricks.com> on 2016/03/08 03:57:44 UTC, 2 replies.
- ML ALS API - posted by Maciej Szymkiewicz <ms...@gmail.com> on 2016/03/08 07:20:14 UTC, 1 replies.
- Re: More Robust DataSource Parameters - posted by Reynold Xin <rx...@databricks.com> on 2016/03/08 07:51:55 UTC, 0 replies.
- BUILD FAILURE due to...Unable to find configuration file at location dev/scalastyle-config.xml - posted by Jacek Laskowski <ja...@japila.pl> on 2016/03/08 08:38:22 UTC, 10 replies.
- Spark structured streaming - posted by Praveen Devarao <pr...@in.ibm.com> on 2016/03/08 10:38:05 UTC, 4 replies.
- Spark 2.0 high level API doc - posted by Reynold Xin <rx...@databricks.com> on 2016/03/09 00:06:51 UTC, 0 replies.
- Spark Scheduler creating Straggler Node - posted by Prabhu Joseph <pr...@gmail.com> on 2016/03/09 05:17:10 UTC, 4 replies.
- Inconsistent file extensions and omitting file extensions written by CSV, TEXT and JSON data sources. - posted by Hyukjin Kwon <gu...@gmail.com> on 2016/03/09 06:49:09 UTC, 3 replies.
- submissionTime vs batchTime, DirectKafka - posted by Sachin Aggarwal <di...@gmail.com> on 2016/03/09 18:09:28 UTC, 6 replies.
- Request to add a new book to the Books section on Spark's website - posted by Mohammed Guller <mo...@glassbeam.com> on 2016/03/09 20:45:12 UTC, 1 replies.
- [RESULT] [VOTE] Release Apache Spark 1.6.1 (RC1) - posted by Michael Armbrust <mi...@databricks.com> on 2016/03/10 01:07:50 UTC, 0 replies.
- dataframe.groupby.agg vs sql("select from groupby)") - posted by FangFang Chen <lu...@163.com> on 2016/03/10 09:19:22 UTC, 1 replies.
- DynamicPartitionKafkaRDD - 1:n mapping between kafka and RDD partition - posted by Renyi Xiong <re...@gmail.com> on 2016/03/10 20:59:33 UTC, 3 replies.
- [ANNOUNCE] Announcing Spark 1.6.1 - posted by Michael Armbrust <mi...@databricks.com> on 2016/03/10 22:08:04 UTC, 0 replies.
- Understanding fault tolerance in shuffle operations - posted by Matt Cheah <mc...@palantir.com> on 2016/03/11 00:26:28 UTC, 0 replies.
- Running ALS on comparitively large RDD - posted by Deepak Gopalakrishnan <dg...@gmail.com> on 2016/03/11 04:52:35 UTC, 4 replies.
- Contributing to managed memory, Tungsten.. - posted by Jan Kotek <di...@kotek.net> on 2016/03/11 08:25:36 UTC, 2 replies.
- Re: pull request template - posted by Marcelo Vanzin <va...@cloudera.com> on 2016/03/12 01:31:50 UTC, 5 replies.
- Re: Spark ML - Scaling logistic regression for many features - posted by Nick Pentreath <ni...@gmail.com> on 2016/03/12 11:53:48 UTC, 2 replies.
- Compare a column in two different tables/find the distance between column data - posted by Suniti Singh <su...@gmail.com> on 2016/03/15 04:46:38 UTC, 4 replies.
- SparkConf constructor now private - posted by Koert Kuipers <ko...@tresata.com> on 2016/03/15 15:37:31 UTC, 2 replies.
- spark 2.0 logging binary incompatibility - posted by Koert Kuipers <ko...@tresata.com> on 2016/03/15 15:49:45 UTC, 4 replies.
- Release Announcement: XGBoost4J - Portable Distributed XGBoost in Spark, Flink and Dataflow - posted by Nan Zhu <zh...@gmail.com> on 2016/03/15 16:53:11 UTC, 0 replies.
- question about catalyst and TreeNode - posted by Koert Kuipers <ko...@tresata.com> on 2016/03/15 17:01:44 UTC, 1 replies.
- Re: Various forks - posted by Sean Owen <so...@cloudera.com> on 2016/03/15 18:24:40 UTC, 2 replies.
- Accessing SparkConf in metrics sink - posted by Pete Robbins <ro...@gmail.com> on 2016/03/16 08:38:12 UTC, 3 replies.
- [POWERED BY] Please add our organization - posted by Craig Lukasik <cl...@zaloni.com> on 2016/03/16 17:18:10 UTC, 2 replies.
- graceful shutdown in external data sources - posted by Dan Burkert <da...@cloudera.com> on 2016/03/16 19:19:06 UTC, 9 replies.
- [discuss] making SparkEnv private in Spark 2.0 - posted by Reynold Xin <rx...@databricks.com> on 2016/03/16 22:52:27 UTC, 3 replies.
- Re: df.dtypes -> pyspark.sql.types - posted by Reynold Xin <rx...@databricks.com> on 2016/03/16 23:44:43 UTC, 0 replies.
- Spark 1.6.1 Hadoop 2.6 package on S3 corrupt? - posted by Nicholas Chammas <ni...@gmail.com> on 2016/03/17 01:15:10 UTC, 14 replies.
- Fwd: Apache Spark Exception in thread “main” java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class - posted by satyajit vegesna <sa...@gmail.com> on 2016/03/17 03:05:42 UTC, 2 replies.
- Spark build with scala-2.10 fails ? - posted by Jeff Zhang <zj...@gmail.com> on 2016/03/17 07:46:00 UTC, 2 replies.
- SPARK-13843 and future of streaming backends - posted by Marcelo Vanzin <va...@cloudera.com> on 2016/03/17 19:14:49 UTC, 37 replies.
- PySpark API divergence + improving pandas interoperability - posted by Wes McKinney <we...@cloudera.com> on 2016/03/18 00:25:16 UTC, 3 replies.
- CfP 11th Workshop on Virtualization in High-Performance Cloud Computing (VHPC '16) - posted by VHPC 16 <vh...@gmail.com> on 2016/03/18 16:30:04 UTC, 0 replies.
- SparkContext.stop() takes too long to complete - posted by Nezih Yigitbasi <ny...@netflix.com.INVALID> on 2016/03/19 00:16:24 UTC, 0 replies.
- Fwd: DF creation - posted by satyajit vegesna <sa...@gmail.com> on 2016/03/19 01:30:38 UTC, 1 replies.
- Request for comments: Tensorframes, an integration library between TensorFlow and Spark DataFrames - posted by Tim Hunter <ti...@databricks.com> on 2016/03/19 02:18:50 UTC, 0 replies.
- Can we remove private[spark] from Metrics Source and SInk traits? - posted by Pete Robbins <ro...@gmail.com> on 2016/03/19 08:32:58 UTC, 4 replies.
- MLPC model can not be saved - posted by HanPan <pa...@thinkingdata.cn> on 2016/03/21 04:31:51 UTC, 1 replies.
- CTAS support in sparksql - posted by Kashish Jain <Ka...@guavus.com> on 2016/03/21 07:08:07 UTC, 0 replies.
- Performance improvements for sorted RDDs - posted by JOAQUIN GUANTER GONZALBEZ <jo...@telefonica.com> on 2016/03/21 09:06:34 UTC, 3 replies.
- subscribe - posted by Namrata Thanvi <na...@gmail.com> on 2016/03/21 11:38:48 UTC, 0 replies.
- java.lang.OutOfMemoryError: Unable to acquire bytes of memory - posted by Nezih Yigitbasi <ny...@netflix.com.INVALID> on 2016/03/21 18:29:51 UTC, 5 replies.
- Re: SparkML algos limitations question. - posted by Eugene Morozov <ev...@gmail.com> on 2016/03/21 19:22:20 UTC, 1 replies.
- Merging ML Estimator and Model - posted by Joseph Bradley <jo...@databricks.com> on 2016/03/21 19:53:29 UTC, 0 replies.
- SPARK-13843 Next steps - posted by Kostas Sakellis <ko...@cloudera.com> on 2016/03/22 08:27:19 UTC, 10 replies.
- error occurs to compile spark 1.6.1 using scala 2.11.8 - posted by Allen <al...@126.com> on 2016/03/22 11:07:16 UTC, 1 replies.
- StatefulNetworkWordCount behaviour - posted by Rishi Mishra <rm...@snappydata.io> on 2016/03/22 12:57:42 UTC, 0 replies.
- toPandas very slow - posted by Josh Levy-Kramer <jo...@starcount.com> on 2016/03/22 13:40:12 UTC, 3 replies.
- Job description only visible after job finish - posted by hansbogert <ha...@gmail.com> on 2016/03/22 13:41:44 UTC, 0 replies.
- new object store driver for Spark - posted by Gil Vernik <GI...@il.ibm.com> on 2016/03/22 14:34:18 UTC, 0 replies.
- EclairJS for "Powered by Spark" Wiki page - posted by DavidFallside <da...@fallside.com> on 2016/03/22 17:46:46 UTC, 1 replies.
- spark 2.0 snapshot change in RowEncoder behavior - posted by Koert Kuipers <ko...@tresata.com> on 2016/03/23 22:20:03 UTC, 1 replies.
- [ml] Two ClassificationModels are final and two are not - why? - posted by Jacek Laskowski <ja...@japila.pl> on 2016/03/24 07:06:32 UTC, 0 replies.
- 答复: MLPC model can not be saved - posted by HanPan <pa...@thinkingdata.cn> on 2016/03/24 08:15:34 UTC, 0 replies.
- [discuss] ending support for Java 7 in Spark 2.0 - posted by Reynold Xin <rx...@databricks.com> on 2016/03/24 08:27:45 UTC, 59 replies.
- Does SparkSql has official jdbc/odbc driver ? - posted by sage <lk...@gmail.com> on 2016/03/25 07:56:26 UTC, 2 replies.
- Any plans to migrate Transformer API to Spark SQL (closer to DataFrames)? - posted by Jacek Laskowski <ja...@japila.pl> on 2016/03/25 16:05:02 UTC, 7 replies.
- [spark.ml] Why is private class ColumnPruner? - posted by Jacek Laskowski <ja...@japila.pl> on 2016/03/25 16:50:07 UTC, 0 replies.
- Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends - posted by Luciano Resende <lu...@gmail.com> on 2016/03/26 18:07:21 UTC, 3 replies.
- BlockManager WARNINGS and ERRORS - posted by salexln <sa...@gmail.com> on 2016/03/27 21:24:57 UTC, 1 replies.
- OOM and "spark.buffer.pageSize" - posted by Steve Johnston <sj...@algebraixdata.com> on 2016/03/28 21:07:46 UTC, 2 replies.
- Master options Cluster/Client descrepencies. - posted by satyajit vegesna <sa...@gmail.com> on 2016/03/29 04:19:45 UTC, 2 replies.
- SparkML RandomForest java.lang.StackOverflowError - posted by Eugene Morozov <ev...@gmail.com> on 2016/03/29 12:12:36 UTC, 0 replies.
- Understanding PySpark Internals - posted by Adam Roberts <AR...@uk.ibm.com> on 2016/03/29 17:21:02 UTC, 1 replies.
- Any documentation on Spark's security model beyond YARN? - posted by Michael Segel <ms...@hotmail.com> on 2016/03/29 23:19:31 UTC, 0 replies.
- aggregateByKey on PairRDD - posted by Suniti Singh <su...@gmail.com> on 2016/03/30 02:36:20 UTC, 2 replies.
- Re: Null pointer exception when using com.databricks.spark.csv - posted by Hyukjin Kwon <gu...@gmail.com> on 2016/03/30 06:03:12 UTC, 2 replies.
- Re: Any documentation on Spark's security model beyond YARN? - posted by Akhil Das <ak...@sigmoidanalytics.com> on 2016/03/30 09:06:31 UTC, 3 replies.
- Discuss: commit to Scala 2.10 support for Spark 2.x lifecycle - posted by Sean Owen <so...@cloudera.com> on 2016/03/30 15:45:16 UTC, 10 replies.
- Spark SQL UDF Returning Rows - posted by Hamel Kothari <ha...@gmail.com> on 2016/03/30 17:47:19 UTC, 3 replies.
- Question Create External table location S3 - posted by Raymond Honderdors <Ra...@sizmek.com> on 2016/03/31 11:00:51 UTC, 2 replies.
- Jenkins PR failing, Mima unhappy: bad constant pool tag 50 at byte 12 - posted by Steve Loughran <st...@hortonworks.com> on 2016/03/31 13:52:55 UTC, 0 replies.
- What influences the space complexity of Spark operations? - posted by Steve Johnston <sj...@algebraixdata.com> on 2016/03/31 18:26:28 UTC, 0 replies.