dev@spark.apache.org, 2016-11

You are viewing a plain text version of this content. The canonical link for it is here.

- Re: Python Spark Improvements (forked from Spark Improvement Proposals) - posted by mariusvniekerk <ma...@gmail.com> on 2016/11/01 03:30:46 UTC, 2 replies.
- Re: Odp.: Spark Improvement Proposals - posted by Reynold Xin <rx...@databricks.com> on 2016/11/01 07:09:45 UTC, 2 replies.
- Re: Updating Parquet dep to 1.9 - posted by Sean Owen <so...@cloudera.com> on 2016/11/01 09:22:14 UTC, 5 replies.
- Question about using collaborative filtering in MLlib - posted by Zak H <za...@gmail.com> on 2016/11/01 17:00:25 UTC, 2 replies.
- Re: JIRA Components for Streaming - posted by Michael Armbrust <mi...@databricks.com> on 2016/11/01 22:47:42 UTC, 0 replies.
- view canonicalization - looking for database gurus to chime in - posted by Reynold Xin <rx...@databricks.com> on 2016/11/01 23:42:48 UTC, 1 replies.
- Re: getting encoder implicits to be more accurate - posted by Sam Goodwin <sa...@gmail.com> on 2016/11/02 04:15:41 UTC, 7 replies.
- [VOTE] Release Apache Spark 2.0.2 (RC2) - posted by Reynold Xin <rx...@databricks.com> on 2016/11/02 04:51:38 UTC, 13 replies.
- Re: [VOTE] Release Apache Spark 2.0.2 (RC1) - posted by vijoshi <vi...@in.ibm.com> on 2016/11/02 04:54:17 UTC, 0 replies.
- [ANNOUNCE] Apache Spark branch-2.1 - posted by Reynold Xin <rx...@databricks.com> on 2016/11/02 05:49:29 UTC, 0 replies.
- Anyone seeing a lot of Spark emails go to Gmail spam? - posted by Sean Owen <so...@cloudera.com> on 2016/11/02 08:18:13 UTC, 4 replies.
- Handling questions in the mailing lists - posted by "assaf.mendelson" <as...@rsa.com> on 2016/11/02 11:32:43 UTC, 36 replies.
- BiMap BroadCast Variable - Kryo Serialization Issue - posted by Kalpana Jalawadi <ka...@gmail.com> on 2016/11/02 19:05:47 UTC, 0 replies.
- Structured streaming aggregation - update mode - posted by Cristian Opris <cr...@gmail.com> on 2016/11/02 22:24:25 UTC, 1 replies.
- Blocked PySpark changes - posted by Holden Karau <ho...@pigscanfly.ca> on 2016/11/02 23:48:25 UTC, 0 replies.
- Re: [VOTE] Release Apache Spark 1.6.3 (RC1) - posted by Reynold Xin <rx...@databricks.com> on 2016/11/03 00:38:02 UTC, 0 replies.
- [VOTE] Release Apache Spark 1.6.3 (RC2) - posted by Reynold Xin <rx...@databricks.com> on 2016/11/03 00:40:03 UTC, 14 replies.
- Evolutionary algorithm (EA) in Spark - posted by Chris Lin <ch...@hotmail.com> on 2016/11/03 05:49:06 UTC, 1 replies.
- Running Unit Tests in pyspark failure - posted by Krishna Kalyan <kr...@gmail.com> on 2016/11/03 15:16:47 UTC, 1 replies.
- . - posted by Per Ullberg <pe...@klarna.com> on 2016/11/03 16:55:41 UTC, 0 replies.
- AnalysisException in first/last during aggregation since 2.0.1 - posted by emlyn <Em...@microsoft.com> on 2016/11/03 20:15:25 UTC, 0 replies.
- Re: Continuous warning while consuming using new kafka-spark010 API - posted by vonnagy <iv...@vadio.com> on 2016/11/04 17:14:48 UTC, 1 replies.
- Hadoop Summit EU 2017 - posted by Owen O'Malley <om...@apache.org> on 2016/11/04 17:30:27 UTC, 0 replies.
- Anyone want to weigh in on a Kafka DStreams api change? - posted by Cody Koeninger <co...@koeninger.org> on 2016/11/04 17:39:29 UTC, 0 replies.
- why visitCreateFileFormat doesn`t support hive STORED BY ,just support store as - posted by 母延年（YDB技术支持） <18...@qq.com> on 2016/11/05 12:27:55 UTC, 0 replies.
- Structured Streaming with Kafka Source, does it work?? - posted by shyla <de...@gmail.com> on 2016/11/07 01:13:17 UTC, 2 replies.
- Using mention-bot to automatically ping potential reviewers - posted by Nicholas Chammas <ni...@gmail.com> on 2016/11/07 02:25:20 UTC, 1 replies.
- Re: Spark Improvement Proposals - posted by Reynold Xin <rx...@databricks.com> on 2016/11/07 18:10:54 UTC, 3 replies.
- Re: REST api for monitoring Spark Streaming - posted by Chan Chor Pang <ch...@indetail.co.jp> on 2016/11/08 01:30:13 UTC, 2 replies.
- [ANNOUNCE] Announcing Apache Spark 1.6.3 - posted by Reynold Xin <rx...@databricks.com> on 2016/11/08 06:07:40 UTC, 0 replies.
- [VOTE] Release Apache Spark 2.0.2 (RC3) - posted by Reynold Xin <rx...@databricks.com> on 2016/11/08 06:09:30 UTC, 26 replies.
- Issue + Resolution: Kmeans Spark Performances (ML package) - posted by Zakaria Hili <za...@gmail.com> on 2016/11/08 15:33:57 UTC, 3 replies.
- Diffing execution plans to understand an optimizer bug - posted by Nicholas Chammas <ni...@gmail.com> on 2016/11/08 21:42:04 UTC, 4 replies.
- Connectors using new Kafka consumer API - posted by Mark Grover <ma...@apache.org> on 2016/11/08 23:26:57 UTC, 4 replies.
- Would "alter table add column" be supported in the future? - posted by 汪洋 <ti...@icloud.com> on 2016/11/09 17:02:15 UTC, 1 replies.
- Contributing to Spark in GSoC 2017 - posted by Krishna Kalyan <kr...@gmail.com> on 2016/11/10 01:33:01 UTC, 0 replies.
- Failed to run spark jobs on mesos due to "hadoop" not found. - posted by Yu Wei <yu...@hotmail.com> on 2016/11/10 11:27:50 UTC, 0 replies.
- If we run sc.textfile(path,xxx) many times, will the elements be the same in each partition - posted by WangJianfei <wa...@otcaix.iscas.ac.cn> on 2016/11/10 13:43:22 UTC, 0 replies.
- Spark Streaming: question on sticky session across batches ? - posted by Manish Malhotra <ma...@gmail.com> on 2016/11/10 16:42:49 UTC, 1 replies.
- Is `randomized aggregation test` testsuite stable? - posted by Dongjoon Hyun <do...@apache.org> on 2016/11/10 16:48:49 UTC, 3 replies.
- Reduce the memory usage if we do same first in GradientBoostedTrees if subsamplingRate< 1.0 - posted by WangJianfei <wa...@otcaix.iscas.ac.cn> on 2016/11/11 13:13:21 UTC, 2 replies.
- spark sql query of nested json lists data - posted by robert <zh...@yahoo.com> on 2016/11/11 19:53:50 UTC, 0 replies.
- withExpr private method duplication in Column and functions objects? - posted by Jacek Laskowski <ja...@japila.pl> on 2016/11/11 21:12:33 UTC, 1 replies.
- ShuffleExchange#nodeName...a duplication...perhaps?! - posted by Jacek Laskowski <ja...@japila.pl> on 2016/11/12 12:39:55 UTC, 0 replies.
- does The Design of spark consider the scala parallelize collections? - posted by WangJianfei <wa...@otcaix.iscas.ac.cn> on 2016/11/12 13:57:46 UTC, 1 replies.
- Component naming in the PR title - posted by Hyukjin Kwon <gu...@gmail.com> on 2016/11/12 17:27:20 UTC, 3 replies.
- how does isDistinct work on expressions - posted by "assaf.mendelson" <as...@rsa.com> on 2016/11/13 11:03:06 UTC, 3 replies.
- Converting spark types and standard scala types - posted by "assaf.mendelson" <as...@rsa.com> on 2016/11/13 11:28:45 UTC, 0 replies.
- On the use of catalyst.dsl package and deserialize vs CatalystSerde.deserialize - posted by Jacek Laskowski <ja...@japila.pl> on 2016/11/13 11:57:58 UTC, 0 replies.
- statistics collection and propagation for cost-based optimizer - posted by Reynold Xin <rx...@databricks.com> on 2016/11/14 01:30:00 UTC, 7 replies.
- Two questions about running spark on mesos - posted by Yu Wei <yu...@hotmail.com> on 2016/11/14 10:58:38 UTC, 2 replies.
- subscribe - posted by Yu Wei <yu...@hotmail.com> on 2016/11/14 11:45:08 UTC, 0 replies.
- Re: Spark-SQL parameters like shuffle.partitions should be stored in the lineage - posted by leo9r <le...@gmail.com> on 2016/11/15 00:19:50 UTC, 5 replies.
- [ANNOUNCE] Apache Spark 2.0.2 - posted by Reynold Xin <rx...@databricks.com> on 2016/11/15 05:14:24 UTC, 4 replies.
- separate spark and hive - posted by "assaf.mendelson" <as...@rsa.com> on 2016/11/15 07:23:39 UTC, 7 replies.
- Fwd: - posted by Anton Okolnychyi <an...@gmail.com> on 2016/11/15 11:26:59 UTC, 0 replies.
- How statistical key rune time - posted by 王桥石 <ui...@163.com> on 2016/11/15 15:32:49 UTC, 0 replies.
- Fwd: using Spark Streaming with Kafka 0.9/0.10 - posted by aakash aakash <em...@gmail.com> on 2016/11/15 18:58:34 UTC, 4 replies.
- NodeManager heap size with ExternalShuffleService - posted by Artur Sukhenko <ar...@gmail.com> on 2016/11/15 20:45:43 UTC, 2 replies.
- Running lint-java during PR builds? - posted by Marcelo Vanzin <va...@cloudera.com> on 2016/11/15 21:21:10 UTC, 2 replies.
- 回复： Reduce the memory usage if we do same first inGradientBoostedTrees if subsamplingRate< 1.0 - posted by WangJianfei <wa...@otcaix.iscas.ac.cn> on 2016/11/16 01:02:27 UTC, 0 replies.
- How do I convert json_encoded_blob_column into a data frame? (This may be a feature request) - posted by kant kodali <ka...@gmail.com> on 2016/11/16 09:44:39 UTC, 7 replies.
- Insert data into hive internal tables using hiveContext - posted by NN...@in.imshealth.com on 2016/11/16 11:29:47 UTC, 0 replies.
- Develop custom Estimator / Transformer for pipeline - posted by Georg Heiler <ge...@gmail.com> on 2016/11/16 14:29:02 UTC, 5 replies.
- Kafka segmentation - posted by Hoang Bao Thien <hb...@gmail.com> on 2016/11/16 22:45:17 UTC, 0 replies.
- SparkILoop doesn't run - posted by Mohit Jaggi <mo...@gmail.com> on 2016/11/16 22:47:20 UTC, 2 replies.
- Multiple streaming aggregations in structured streaming - posted by wszxyh <ws...@163.com> on 2016/11/17 02:58:52 UTC, 3 replies.
- SQL Syntax for pivots - posted by Niranda Perera <ni...@gmail.com> on 2016/11/17 06:44:37 UTC, 1 replies.
- issues with github pull request notification emails missing - posted by Reynold Xin <rx...@databricks.com> on 2016/11/17 07:20:25 UTC, 3 replies.
- structured streaming and window functions - posted by "assaf.mendelson" <as...@rsa.com> on 2016/11/17 08:19:31 UTC, 6 replies.
- Another Interesting Question on SPARK SQL - posted by kant kodali <ka...@gmail.com> on 2016/11/17 09:28:30 UTC, 1 replies.
- Green dot in web UI DAG visualization - posted by Nicholas Chammas <ni...@gmail.com> on 2016/11/17 17:10:00 UTC, 6 replies.
- Jackson Spark/app incompatibility and how to resolve it - posted by Michael Allman <mi...@videoamp.com> on 2016/11/17 17:59:48 UTC, 0 replies.
- [build system] massive jenkins infrastructure changes forthcoming - posted by shane knapp <sk...@berkeley.edu> on 2016/11/17 22:33:12 UTC, 2 replies.
- Re: Failed to run spark jobs on mesos due to "hadoop" not found. - posted by Meethu Mathew <me...@flytxt.com> on 2016/11/18 11:26:29 UTC, 0 replies.
- Analyzing and reusing cached Datasets - posted by Jacek Laskowski <ja...@japila.pl> on 2016/11/19 20:19:07 UTC, 2 replies.
- Re: OutOfMemoryError on parquet SnappyDecompressor - posted by Aniket <an...@gmail.com> on 2016/11/20 11:35:40 UTC, 3 replies.
- github mirroring is broken - posted by Reynold Xin <rx...@databricks.com> on 2016/11/20 20:17:40 UTC, 0 replies.
- How is the order ensured in the jdbc relation provider when inserting data from multiple executors - posted by Niranda Perera <ni...@gmail.com> on 2016/11/21 09:03:07 UTC, 5 replies.
- MinMaxScaler behaviour - posted by Joeri Hermans <jo...@cern.ch> on 2016/11/21 20:25:31 UTC, 2 replies.
- Re: Memory leak warnings in Spark 2.0.1 - posted by Nicholas Chammas <ni...@gmail.com> on 2016/11/21 21:16:51 UTC, 2 replies.
- Please limit commits for branch-2.1 - posted by Joseph Bradley <jo...@databricks.com> on 2016/11/21 23:19:35 UTC, 3 replies.
- [SPARK-16654][CORE][WIP] Add UI coverage for Application Level Blacklisting - posted by Jose Soltren <jo...@cloudera.com> on 2016/11/22 03:31:45 UTC, 2 replies.
- Re: How to convert spark data-frame to datasets? - posted by Sachith Withana <sa...@wso2.com> on 2016/11/22 05:03:08 UTC, 1 replies.
- [SparkStreaming] 1 SQL tab for each SparkStreaming batch in SparkUI - posted by Dirceu Semighini Filho <di...@gmail.com> on 2016/11/22 13:51:32 UTC, 3 replies.
- Is it possible to pass "-javaagent=customAgent.jar" into spark as a JAVA_OPTS - posted by Zak H <za...@gmail.com> on 2016/11/22 21:50:46 UTC, 1 replies.
- Spark Wiki now migrated to spark.apache.org - posted by Sean Owen <so...@cloudera.com> on 2016/11/23 11:29:07 UTC, 2 replies.
- Aggregating over sorted data - posted by "assaf.mendelson" <as...@rsa.com> on 2016/11/23 12:36:39 UTC, 0 replies.
- PowerIterationClustering can't handle "large" files - posted by Lydia Ickler <ic...@googlemail.com> on 2016/11/23 12:37:38 UTC, 0 replies.
- [Spark Thriftserver] connection timeout option? - posted by Artur Sukhenko <ar...@gmail.com> on 2016/11/23 14:46:20 UTC, 0 replies.
- FOSDEM 2017 HPC, Bigdata and Data Science DevRoom CFP is closing soon - posted by Roman Shaposhnik <ro...@shaposhnik.org> on 2016/11/23 21:11:19 UTC, 0 replies.
- SparkUI via proxy - posted by marco rocchi <ro...@studenti.uniroma1.it> on 2016/11/24 15:33:28 UTC, 5 replies.
- Parquet-like partitioning support in spark SQL's in-memory columnar cache - posted by Nitin Goyal <ni...@gmail.com> on 2016/11/25 05:06:09 UTC, 2 replies.
- Scaling issues due to contention in Random - posted by Prasun Ratn <pr...@gmail.com> on 2016/11/25 07:29:46 UTC, 1 replies.
- [SQL][JDBC] Possible regression in JDBC reader - posted by Maciej Szymkiewicz <ms...@gmail.com> on 2016/11/25 12:50:55 UTC, 4 replies.
- Third party library - posted by vineet chadha <st...@gmail.com> on 2016/11/25 22:33:09 UTC, 1 replies.
- Use of BroadcastFactory interface (after SPARK-12588 Remove HTTPBroadcast) - posted by Jacek Laskowski <ja...@japila.pl> on 2016/11/27 14:23:36 UTC, 0 replies.
- Typo in the programming guide? - posted by Anton Okolnychyi <ok...@gmail.com> on 2016/11/27 17:23:12 UTC, 1 replies.
- Two major versions? - posted by Dongjoon Hyun <do...@apache.org> on 2016/11/27 20:49:29 UTC, 4 replies.
- [VOTE] Apache Spark 2.1.0 (RC1) - posted by Reynold Xin <rx...@databricks.com> on 2016/11/29 01:25:43 UTC, 10 replies.
- Bit-wise AND operation between integers - posted by Nishadi Kirielle <nd...@gmail.com> on 2016/11/29 05:40:33 UTC, 2 replies.
- Please add me - posted by Srinivas Potluri <sh...@gmail.com> on 2016/11/29 17:59:18 UTC, 1 replies.
- Can't read tables written in Spark 2.1 in Spark 2.0 (and earlier) - posted by Michael Allman <mi...@videoamp.com> on 2016/11/30 01:15:46 UTC, 1 replies.
- Question about spark.mllib.GradientDescent - posted by WangJianfei <wa...@otcaix.iscas.ac.cn> on 2016/11/30 02:18:10 UTC, 0 replies.
- Spark-9487, Need some insight - posted by Saikat Kanjilal <sx...@hotmail.com> on 2016/11/30 04:14:01 UTC, 0 replies.
- Proposal for SPARK-18278 - posted by Matt Cheah <mc...@palantir.com> on 2016/11/30 06:22:04 UTC, 0 replies.
- Why don't we imp some adaptive learning rate methods, such as adadelat, adam? - posted by WangJianfei <wa...@otcaix.iscas.ac.cn> on 2016/11/30 08:51:43 UTC, 2 replies.
- [SPARK-17845] [SQL][PYTHON] More self-evident window function frame boundary API - posted by Maciej Szymkiewicz <ms...@gmail.com> on 2016/11/30 16:27:08 UTC, 5 replies.
- Hidden Markov Model or Bayes Networks in Spark - MS Thesis theme - posted by Alex153 <al...@gmail.com> on 2016/11/30 20:30:16 UTC, 0 replies.