user@spark.apache.org, 2016-08

You are viewing a plain text version of this content. The canonical link for it is here.

- Re: How to write contents of RDD to HDFS as separate file for each item in RDD (PySpark) - posted by Andrew Ehrlich <an...@aehrlich.com> on 2016/08/01 00:18:29 UTC, 0 replies.
- Re: Tuning level of Parallelism: Increase or decrease? - posted by Andrew Ehrlich <an...@aehrlich.com> on 2016/08/01 00:27:49 UTC, 9 replies.
- Re: Spark recovery takes long - posted by NB <nb...@gmail.com> on 2016/08/01 00:34:05 UTC, 0 replies.
- Windows - Spark 2 - Standalone - Worker not able to connect to Master - posted by ayan guha <gu...@gmail.com> on 2016/08/01 01:24:50 UTC, 3 replies.
- Re: spark java - convert string to date - posted by Hyukjin Kwon <gu...@gmail.com> on 2016/08/01 01:46:59 UTC, 0 replies.
- Re: [Spark 2.0] Why MutableInt cannot be cast to MutableLong? - posted by Chanh Le <gi...@gmail.com> on 2016/08/01 02:21:43 UTC, 1 replies.
- is Hadoop need to be installed? - posted by ayan guha <gu...@gmail.com> on 2016/08/01 03:20:28 UTC, 0 replies.
- Re: Writing custom Transformers and Estimators like Tokenizer in spark ML - posted by janardhan shetty <ja...@gmail.com> on 2016/08/01 03:27:59 UTC, 2 replies.
- Re: sql to spark scala rdd - posted by Sri <ka...@gmail.com> on 2016/08/01 06:04:54 UTC, 5 replies.
- spark.read.format("jdbc") - posted by kevin <ki...@gmail.com> on 2016/08/01 06:30:08 UTC, 4 replies.
- Re: multiple spark streaming contexts - posted by Sumit Khanna <su...@askme.in> on 2016/08/01 06:39:46 UTC, 3 replies.
- Getting error, when I do df.show() - posted by Subhajit Purkayastha <sp...@p3si.net> on 2016/08/01 08:22:08 UTC, 1 replies.
- Re: JettyUtils.createServletHandler Method not Found? - posted by bg_spark <14...@qq.com> on 2016/08/01 08:37:07 UTC, 1 replies.
- Re: spark 2.0 readStream from a REST API - posted by Ayoub Benali <be...@gmail.com> on 2016/08/01 09:01:32 UTC, 6 replies.
- java.net.UnknownHostException - posted by pseudo oduesp <ps...@gmail.com> on 2016/08/01 09:51:25 UTC, 2 replies.
- Testing --supervise flag - posted by Noorul Islam K M <no...@noorul.com> on 2016/08/01 10:51:19 UTC, 1 replies.
- Re: ERROR Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. - posted by Ted Yu <yu...@gmail.com> on 2016/08/01 10:55:50 UTC, 0 replies.
- Windows operation orderBy desc - posted by Ashok Kumar <as...@yahoo.com.INVALID> on 2016/08/01 13:56:35 UTC, 2 replies.
- Need Advice: Spark-Streaming Setup - posted by David Kaufman <da...@gmx.de> on 2016/08/01 14:47:09 UTC, 0 replies.
- Spark 2 and Solr - posted by Andrés Ivaldi <ia...@gmail.com> on 2016/08/01 14:56:49 UTC, 0 replies.
- Re: Possible to push sub-queries down into the DataSource impl? - posted by Timothy Potter <th...@gmail.com> on 2016/08/01 15:45:27 UTC, 0 replies.
- Plans for improved Spark DataFrame/Dataset unit testing? - posted by Everett Anderson <ev...@nuna.com.INVALID> on 2016/08/01 16:02:30 UTC, 6 replies.
- Re: sampling operation for DStream - posted by Martin Le <ma...@gmail.com> on 2016/08/01 16:24:52 UTC, 3 replies.
- Re: Problems initializing SparkUI - posted by Maximiliano Patricio Méndez <mm...@despegar.com> on 2016/08/01 16:44:14 UTC, 9 replies.
- python 'Jupyter' data frame problem with autocompletion - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/08/01 17:08:37 UTC, 0 replies.
- The equivalent for INSTR in Spark FP - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/08/01 17:24:28 UTC, 10 replies.
- SQL predicate pushdown on parquet or other columnar formats - posted by Sandeep Joshi <sa...@gmail.com> on 2016/08/01 18:17:12 UTC, 1 replies.
- Mlib RandomForest (Spark 2.0) predict a single vector - posted by "itai.efrati" <it...@gmail.com> on 2016/08/01 19:32:45 UTC, 0 replies.
- Re: Java Recipes for Spark - posted by Marco Mistroni <mm...@gmail.com> on 2016/08/01 20:03:17 UTC, 1 replies.
- Spark 2.0 History Server Storage - posted by Andrei Ivanov <ai...@iponweb.net> on 2016/08/01 21:10:48 UTC, 2 replies.
- [MLlib] Term Frequency in TF-IDF seems incorrect - posted by Hao Ren <in...@gmail.com> on 2016/08/01 22:29:23 UTC, 2 replies.
- Re: tpcds for spark2.0 - posted by kevin <ki...@gmail.com> on 2016/08/02 01:49:55 UTC, 0 replies.
- unsubscribe - posted by zhangjp <59...@qq.com> on 2016/08/02 03:00:28 UTC, 4 replies.
- Sqoop On Spark - posted by Selvam Raman <se...@gmail.com> on 2016/08/02 04:52:33 UTC, 1 replies.
- What are using Spark for - posted by Rohit L <ro...@gmail.com> on 2016/08/02 05:48:37 UTC, 8 replies.
- Spark GraphFrames - posted by Divya Gehlot <di...@gmail.com> on 2016/08/02 05:50:47 UTC, 3 replies.
- Does it has a way to config limit in query on STS by default? - posted by Chanh Le <gi...@gmail.com> on 2016/08/02 07:41:25 UTC, 10 replies.
- Application not showing in Spark History - posted by "Rychnovsky, Dusan" <Du...@firma.seznam.cz> on 2016/08/02 08:53:58 UTC, 2 replies.
- issue with coalesce in Spark 2.0.0 - posted by 陈宇航 <yu...@foxmail.com> on 2016/08/02 09:57:05 UTC, 1 replies.
- Are join/groupBy operations with wide Java Beans using Dataset API much slower than using RDD API? - posted by dueckm <ma...@fiduciagad.de> on 2016/08/02 12:12:23 UTC, 2 replies.
- decribe function limit of columns - posted by pseudo oduesp <ps...@gmail.com> on 2016/08/02 13:39:22 UTC, 1 replies.
- Are join/groupBy operations with wide Java Beans using Dataset API much slower than using RDD API? [*] - posted by dueckm <ma...@fiduciagad.de> on 2016/08/02 13:50:46 UTC, 0 replies.
- Extracting key word from a textual column - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/08/02 14:00:50 UTC, 11 replies.
- Saving input schema along with PipelineModel - posted by Satyanarayan Patel <sa...@gmail.com> on 2016/08/02 15:21:09 UTC, 1 replies.
- Re: spark 1.6.0 read s3 files error. - posted by freedafeng <fr...@yahoo.com> on 2016/08/02 16:26:12 UTC, 2 replies.
- FW: [jupyter] newbie. apache spark python3 'Jupyter' data frame problem with auto completion and accessing documentation - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/08/02 16:57:36 UTC, 0 replies.
- Job can not terminated in Spark 2.0 on Yarn - posted by Liangzhao Zeng <li...@gmail.com> on 2016/08/02 17:06:37 UTC, 6 replies.
- saving data frame to optimize joins at a later time - posted by Cesar <ce...@gmail.com> on 2016/08/02 19:09:00 UTC, 0 replies.
- In 2.0.0, is it possible to fetch a query from an external database (rather than grab the whole table)? - posted by pgb <pb...@doximity.com> on 2016/08/02 20:25:22 UTC, 2 replies.
- [2.0.0] mapPartitions on DataFrame unable to find encoder - posted by Dragisa Krsmanovic <dr...@ticketfly.com> on 2016/08/02 20:55:53 UTC, 4 replies.
- Re: calling dataset.show on a custom object - displays toString() value as first column and blank for rest - posted by Jacek Laskowski <ja...@japila.pl> on 2016/08/02 22:44:47 UTC, 0 replies.
- Spark 2.0 error: Wrong FS: file://spark-warehouse, expected: file:/// - posted by Utkarsh Sengar <ut...@gmail.com> on 2016/08/02 23:47:38 UTC, 4 replies.
- Stop Spark Streaming Jobs - posted by Pradeep <pr...@mail.com> on 2016/08/03 01:47:58 UTC, 6 replies.
- Spark on yarn, only 1 or 2 vcores getting allocated to the containers getting created. - posted by satyajit vegesna <sa...@gmail.com> on 2016/08/03 04:11:10 UTC, 3 replies.
- Relative path in absolute URI - posted by Abhishek Ranjan <ra...@gmail.com> on 2016/08/03 06:54:00 UTC, 1 replies.
- How to get recommand result for users in a kafka SparkStreaming Application - posted by lu...@sina.com on 2016/08/03 07:01:10 UTC, 1 replies.
- 回复：How to get recommand result for users in a kafka SparkStreaming Application - posted by lu...@sina.com on 2016/08/03 07:27:47 UTC, 0 replies.
- "object cannot be cast to Double" using pipline with pyspark - posted by colin <co...@sina.cn> on 2016/08/03 08:15:00 UTC, 0 replies.
- spark 2.0.0 - how to build an uber-jar? - posted by lev <ka...@gmail.com> on 2016/08/03 08:43:11 UTC, 3 replies.
- Re: How to partition a SparkDataFrame using all distinct column values in sparkR - posted by Sun Rui <su...@163.com> on 2016/08/03 10:04:58 UTC, 0 replies.
- Spark 2.0 empty result in some tpc-h queries - posted by eviekas <ev...@cslab.ece.ntua.gr> on 2016/08/03 10:55:29 UTC, 0 replies.
- Spark steaming with Flume jobs failing - posted by Bhupendra Mishra <bh...@gmail.com> on 2016/08/03 11:00:31 UTC, 0 replies.
- converting a Dataset into JavaRDD - posted by "Carlo.Allocca" <ca...@open.ac.uk> on 2016/08/03 11:14:12 UTC, 1 replies.
- [Thriftserver2] Controlling number of tasks - posted by Yana Kadiyska <ya...@gmail.com> on 2016/08/03 12:03:06 UTC, 3 replies.
- Managed memory leak detected + OutOfMemoryError: Unable to acquire X bytes of memory, got 0 - posted by "Rychnovsky, Dusan" <Du...@firma.seznam.cz> on 2016/08/03 12:03:15 UTC, 6 replies.
- Spark 2.0 - Case sensitive column names while reading csv - posted by Aseem Bansal <as...@gmail.com> on 2016/08/03 12:04:09 UTC, 0 replies.
- Sparkstreaming not consistently picking files from streaming directory - posted by "ravi.gawai" <ra...@gmail.com> on 2016/08/03 12:55:18 UTC, 0 replies.
- OOM with StringIndexer, 800m rows & 56m distinct value column - posted by Ben Teeuwen <bt...@gmail.com> on 2016/08/03 14:00:04 UTC, 10 replies.
- [SQL] Reading from hive table is listing all files in S3 - posted by Mehdi Meziane <me...@ldmobile.net> on 2016/08/03 14:03:18 UTC, 4 replies.
- Python : StreamingContext isn't created properly - posted by Paolo Patierno <pp...@live.com> on 2016/08/03 14:16:22 UTC, 1 replies.
- SparkSession for RDBMS - posted by Selvam Raman <se...@gmail.com> on 2016/08/03 14:19:20 UTC, 1 replies.
- Error in building spark core on windows - any suggestions please - posted by Tony Lane <to...@gmail.com> on 2016/08/03 14:30:55 UTC, 2 replies.
- How does MapWithStateRDD distribute the data - posted by Soumitra Johri <so...@gmail.com> on 2016/08/03 14:42:30 UTC, 2 replies.
- Change nullable property in Dataset schema - posted by Kazuaki Ishizaki <IS...@jp.ibm.com> on 2016/08/03 14:45:06 UTC, 5 replies.
- Calling KmeansModel predict method - posted by Rohit Chaddha <ro...@gmail.com> on 2016/08/03 16:24:17 UTC, 1 replies.
- Re: How to generate a sequential key in rdd across executors - posted by yeshwanth kumar <ye...@gmail.com> on 2016/08/03 16:35:56 UTC, 0 replies.
- Dataset and JavaRDD: how to eliminate the header. - posted by "Carlo.Allocca" <ca...@open.ac.uk> on 2016/08/03 16:44:59 UTC, 5 replies.
- Spark 2.0: Task never completes - posted by Utkarsh Sengar <ut...@gmail.com> on 2016/08/03 17:04:30 UTC, 1 replies.
- Re: Dataset and JavaRDD: how to eliminate the header. - posted by Aseem Bansal <as...@gmail.com> on 2016/08/03 17:13:55 UTC, 3 replies.
- Using sparse vector leads to array out of bounds exception - posted by Tony Lane <to...@gmail.com> on 2016/08/03 17:43:38 UTC, 5 replies.
- java.net.URISyntaxException: Relative path in absolute URI: - posted by Flavio <ma...@gmail.com> on 2016/08/03 18:05:01 UTC, 6 replies.
- Re: how to use spark.mesos.constraints - posted by Michael Gummelt <mg...@mesosphere.io> on 2016/08/04 00:01:27 UTC, 0 replies.
- Re: Executors assigned to STS and number of workers in Stand Alone Mode - posted by Michael Gummelt <mg...@mesosphere.io> on 2016/08/04 00:27:40 UTC, 1 replies.
- Re: standalone mode only supports FIFO scheduler across applications ? still in spark 2.0 time ? - posted by Michael Gummelt <mg...@mesosphere.io> on 2016/08/04 00:30:14 UTC, 0 replies.
- how to debug spark app? - posted by glen <cn...@126.com> on 2016/08/04 01:13:07 UTC, 3 replies.
- PermGen space Error - posted by $iddhe$h Divekar <si...@gmail.com> on 2016/08/04 03:14:22 UTC, 1 replies.
- 2.0.0 packages for twitter streaming, flume and other connectors - posted by Kiran Chitturi <ki...@lucidworks.com> on 2016/08/04 03:40:58 UTC, 4 replies.
- Spark 2.0 - make-distribution fails while regular build succeeded - posted by Richard Siebeling <rs...@gmail.com> on 2016/08/04 06:09:27 UTC, 3 replies.
- how to run local[k] threads on a single core - posted by sujeet jog <su...@gmail.com> on 2016/08/04 06:27:35 UTC, 3 replies.
- How to connect Power BI to Apache Spark on local machine? - posted by "Devi P.V" <de...@gmail.com> on 2016/08/04 06:54:16 UTC, 0 replies.
- Explanation regarding Spark Streaming - posted by Saurav Sinha <sa...@gmail.com> on 2016/08/04 06:56:36 UTC, 10 replies.
- Spark SQL and number of task - posted by Marco Colombo <in...@gmail.com> on 2016/08/04 07:58:44 UTC, 5 replies.
- Questions about ml.random forest (only one decision tree?) - posted by 陈哲 <cz...@gmail.com> on 2016/08/04 08:48:13 UTC, 1 replies.
- [Spark 2.0] How to optimise the query that do shuffle alot? - posted by Chanh Le <gi...@gmail.com> on 2016/08/04 09:23:43 UTC, 1 replies.
- pycharm and pyspark on windows - posted by pseudo oduesp <ps...@gmail.com> on 2016/08/04 09:45:40 UTC, 0 replies.
- SPARKSQL with HiveContext My job fails - posted by Vasu Devan <va...@gmail.com> on 2016/08/04 09:49:54 UTC, 1 replies.
- [Spark 2.0] Problem with Spark Thrift Server show NULL instead of showing BIGINT value - posted by Chanh Le <gi...@gmail.com> on 2016/08/04 10:35:13 UTC, 5 replies.
- How to avoid sql injection on SparkSQL? - posted by Linyuxin <li...@huawei.com> on 2016/08/04 11:59:58 UTC, 0 replies.
- Using Spark 2.0 inside Docker - posted by mhornbech <mo...@datasolvr.com> on 2016/08/04 13:07:39 UTC, 0 replies.
- registering udf to use in spark.sql('select... - posted by Ben Teeuwen <bt...@gmail.com> on 2016/08/04 13:10:00 UTC, 4 replies.
- source code for org.spark-project.hive - posted by prabhat__ <pr...@gmail.com> on 2016/08/04 13:23:10 UTC, 2 replies.
- Add column sum as new column in PySpark dataframe - posted by Javier Rey <jr...@gmail.com> on 2016/08/04 13:38:31 UTC, 4 replies.
- num-executors, executor-memory and executor-cores parameters - posted by Ashok Kumar <as...@yahoo.com.INVALID> on 2016/08/04 13:39:02 UTC, 1 replies.
- WindowsError: [Error 2] The system cannot find the file specified - posted by pseudo oduesp <ps...@gmail.com> on 2016/08/04 14:01:48 UTC, 1 replies.
- Spark jobs failing due to java.lang.OutOfMemoryError: PermGen space - posted by $iddhe$h Divekar <si...@gmail.com> on 2016/08/04 14:34:41 UTC, 3 replies.
- Raleigh, Durham, and around... - posted by Jean Georges Perrin <jg...@jgp.net> on 2016/08/04 14:39:59 UTC, 0 replies.
- Symbol HasInputCol is inaccesible from this place - posted by janardhan shetty <ja...@gmail.com> on 2016/08/04 20:18:25 UTC, 7 replies.
- How to set nullable field when create DataFrame using case class - posted by luismattor <lu...@gmail.com> on 2016/08/04 21:56:59 UTC, 7 replies.
- Re: Writing all values for same key to one file - posted by rtijoriwala <ti...@gmail.com> on 2016/08/05 00:10:35 UTC, 5 replies.
- Re: Spark SQL Hive Authorization - posted by "arin.g" <ar...@yahoo.com> on 2016/08/05 00:39:00 UTC, 0 replies.
- singular value decomposition in Spark ML - posted by Sandy Ryza <sa...@gmail.com> on 2016/08/05 01:41:00 UTC, 0 replies.
- Spark 1.6 Streaming delay after long run - posted by Chan Chor Pang <ch...@indetail.co.jp> on 2016/08/05 01:51:24 UTC, 0 replies.
- [Spark1.6]:compare rows and add new column based on lookup - posted by Divya Gehlot <di...@gmail.com> on 2016/08/05 02:16:44 UTC, 3 replies.
- Regression in Java RDD sortBy() in Spark 2.0 - posted by Andy Grove <an...@agildata.com> on 2016/08/05 04:25:13 UTC, 1 replies.
- Java and SparkSession - posted by Andy Grove <an...@agildata.com> on 2016/08/05 04:41:36 UTC, 2 replies.
- pyspark pickle error when using itertools.groupby - posted by 林家銘 <ro...@gmail.com> on 2016/08/05 05:31:09 UTC, 1 replies.
- Bug: Spark Streaming Application Failure Recovery Failed on Windows - posted by 张梓轩 <zi...@gmail.com> on 2016/08/05 07:20:41 UTC, 0 replies.
- Dataframe insertInto(tableName: String): Unit :Failure Scenario - posted by java bigdata <ha...@gmail.com> on 2016/08/05 07:49:26 UTC, 0 replies.
- What is "Developer API " in spark documentation? - posted by Aseem Bansal <as...@gmail.com> on 2016/08/05 09:55:43 UTC, 1 replies.
- Spark 2.0.0 - Apply schema on few columns of dataset - posted by Aseem Bansal <as...@gmail.com> on 2016/08/05 12:06:11 UTC, 7 replies.
- Generating unique id for a column in Row without breaking into RDD and joining back - posted by Tony Lane <to...@gmail.com> on 2016/08/05 12:14:15 UTC, 14 replies.
- pyspark on pycharm on WINDOWS - posted by pseudo oduesp <ps...@gmail.com> on 2016/08/05 13:35:59 UTC, 1 replies.
- submitting spark job with kerberized Hadoop issue - posted by Aneela Saleem <an...@platalytics.com> on 2016/08/05 13:54:14 UTC, 10 replies.
- [Spark 2.0] Error during codegen for Java POJO - posted by Andy Grove <an...@agildata.com> on 2016/08/05 14:28:48 UTC, 1 replies.
- ClassNotFoundException org.apache.spark.Logging - posted by "Carlo.Allocca" <ca...@open.ac.uk> on 2016/08/05 16:53:43 UTC, 4 replies.
- Re: ClassNotFoundException org.apache.spark.Logging - posted by Ted Yu <yu...@gmail.com> on 2016/08/05 16:58:00 UTC, 3 replies.
- Question: collect action returning to driver - posted by RK Aduri <rk...@collectivei.com> on 2016/08/05 17:12:30 UTC, 0 replies.
- Avoid Cartesian product in calculating a distance matrix? - posted by Paschalis Veskos <ve...@gmail.com> on 2016/08/05 17:20:09 UTC, 2 replies.
- Applying a limit after orderBy of big dataframe hangs spark - posted by Sa...@wellsfargo.com on 2016/08/05 18:54:05 UTC, 2 replies.
- flume.thrift.queuesize - posted by Bhupendra Mishra <bh...@gmail.com> on 2016/08/05 19:25:23 UTC, 0 replies.
- Kmeans dataset initialization - posted by Tony Lane <to...@gmail.com> on 2016/08/05 19:33:35 UTC, 1 replies.
- spark historyserver backwards compatible - posted by Koert Kuipers <ko...@tresata.com> on 2016/08/05 21:14:46 UTC, 2 replies.
- 2.0.0: AnalysisException when reading csv/json files with dots in periods - posted by Kiran Chitturi <ki...@lucidworks.com> on 2016/08/06 00:33:30 UTC, 1 replies.
- 2.0.0: Hive metastore uses a different version of derby than the Spark package - posted by Kiran Chitturi <ki...@lucidworks.com> on 2016/08/06 00:45:02 UTC, 0 replies.
- Re: mapWithState handle timeout - posted by jackerli <hu...@gmail.com> on 2016/08/06 07:57:48 UTC, 2 replies.
- Dropping late date in Structured Streaming - posted by Amit Sela <am...@gmail.com> on 2016/08/06 13:40:52 UTC, 1 replies.
- Help testing the Spark Extensions for the Apache Bahir 2.0.0 release - posted by Luciano Resende <lu...@gmail.com> on 2016/08/06 17:18:07 UTC, 2 replies.
- Long running tasks in stages - posted by Deepak Sharma <de...@gmail.com> on 2016/08/06 17:31:24 UTC, 0 replies.
- Dataframe / Dataset partition size... - posted by Muthu Jayakumar <ba...@gmail.com> on 2016/08/06 19:56:24 UTC, 1 replies.
- Spark Application Counters Using Rest API - posted by Muhammad Haris <mu...@gmail.com> on 2016/08/06 20:44:35 UTC, 0 replies.
- spark df schema to hive schema converter func - posted by Sumit Khanna <su...@askme.in> on 2016/08/07 05:03:37 UTC, 1 replies.
- [Spark1.6] Or (||) operator not working in DataFrame - posted by Divya Gehlot <di...@gmail.com> on 2016/08/07 14:43:06 UTC, 6 replies.
- Sorting a DStream and taking topN - posted by Ahmed El-Gamal <ah...@badrit.com> on 2016/08/07 14:43:56 UTC, 1 replies.
- Accessing HBase through Spark with Security enabled - posted by Aneela Saleem <an...@platalytics.com> on 2016/08/07 16:02:54 UTC, 7 replies.
- [SPARK-2.0][SQL] UDF containing non-serializable object does not work as expected - posted by Hao Ren <in...@gmail.com> on 2016/08/07 21:31:33 UTC, 5 replies.
- Using Kyro for DataFrames (Dataset)? - posted by Jestin Ma <je...@gmail.com> on 2016/08/07 22:31:34 UTC, 0 replies.
- Random forest binary classification H20 difference Spark - posted by Javier Rey <jr...@gmail.com> on 2016/08/08 02:51:08 UTC, 2 replies.
- silence the spark debug logs - posted by Sumit Khanna <su...@askme.in> on 2016/08/08 04:09:17 UTC, 1 replies.
- Any exceptions during an action doesn't fail the Spark streaming batch in yarn-client mode - posted by Hemalatha A <he...@googlemail.com> on 2016/08/08 04:44:20 UTC, 1 replies.
- map vs mapPartitions - posted by rtijoriwala <ti...@gmail.com> on 2016/08/08 06:20:24 UTC, 0 replies.
- Is Spark right for my use case? - posted by danellis <da...@danellis.me> on 2016/08/08 06:22:37 UTC, 1 replies.
- Spark 2.0.0 - Broadcast variable - What is ClassTag? - posted by Aseem Bansal <as...@gmail.com> on 2016/08/08 06:32:10 UTC, 3 replies.
- hdfs persist rollbacks when spark job is killed - posted by Sumit Khanna <su...@askme.in> on 2016/08/08 06:35:34 UTC, 4 replies.
- What are the configurations needs to connect spark and ms-sql server? - posted by "Devi P.V" <de...@gmail.com> on 2016/08/08 07:44:22 UTC, 1 replies.
- Spark driver memory keeps growing - posted by Pierre Villard <pi...@gmail.com> on 2016/08/08 08:17:53 UTC, 0 replies.
- why spark 2 shell console still sending warnings despite setting log4j.rootCategory=ERROR, console - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/08/08 08:33:33 UTC, 1 replies.
- how to generate a column using mapParition and then add it back to the df? - posted by MoTao <mo...@sensetime.com> on 2016/08/08 09:00:51 UTC, 1 replies.
- Multiple Sources Found for Parquet - posted by 金国栋 <sc...@gmail.com> on 2016/08/08 09:34:17 UTC, 2 replies.
- 答复: how to generate a column using mapParition and then add it back to the df? - posted by 莫涛 <mo...@sensetime.com> on 2016/08/08 09:44:13 UTC, 1 replies.
- Machine learning question (suing spark)- removing redundant factors while doing clustering - posted by Rohit Chaddha <ro...@gmail.com> on 2016/08/08 11:42:22 UTC, 11 replies.
- Spark 2 and existing code with sqlContext - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/08/08 13:11:23 UTC, 3 replies.
- FW: Have I done everything correctly when subscribing to Spark User List - posted by Chris Mattmann <ma...@apache.org> on 2016/08/08 14:03:59 UTC, 4 replies.
- zip for pyspark - posted by pseudo oduesp <ps...@gmail.com> on 2016/08/08 14:44:31 UTC, 1 replies.
- Unsubscribe - posted by bi...@gmail.com on 2016/08/08 14:46:15 UTC, 8 replies.
- Best practises around spark-scala - posted by Deepak Sharma <de...@gmail.com> on 2016/08/08 15:11:06 UTC, 2 replies.
- Source format for Apache Spark logo - posted by Mi...@gdata-adan.de on 2016/08/08 16:24:46 UTC, 2 replies.
- using matrix as column datatype in SparkSQL Dataframe - posted by "Vadla, Karthik" <ka...@intel.com> on 2016/08/08 18:06:47 UTC, 1 replies.
- Spark join and large temp files - posted by Ashic Mahtab <as...@live.com> on 2016/08/08 18:17:29 UTC, 16 replies.
- Getting a TreeNode Exception while saving into Hadoop - posted by max square <ma...@gmail.com> on 2016/08/08 18:47:34 UTC, 8 replies.
- SPARK SQL READING FROM HIVE - posted by manish jaiswal <ma...@gmail.com> on 2016/08/08 18:48:04 UTC, 6 replies.
- java.lang.OutOfMemoryError: GC overhead limit exceeded when using UDFs in SparkSQL (Spark 2.0.0) - posted by Zoltan Fedor <zo...@gmail.com> on 2016/08/08 21:24:58 UTC, 4 replies.
- Issue with temporary table in Spark 2 - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/08/08 22:14:47 UTC, 0 replies.
- Logistic regression formula string - posted by Cesar <ce...@gmail.com> on 2016/08/08 22:53:24 UTC, 1 replies.
- Cumulative Sum function using Dataset API - posted by jon <jo...@gmail.com> on 2016/08/08 23:53:12 UTC, 6 replies.
- 答复: 答复: how to generate a column using mapParition and then add it back to the df? - posted by 莫涛 <mo...@sensetime.com> on 2016/08/09 02:14:28 UTC, 0 replies.
- SparkR error when repartition is called - posted by Shane Lee <sh...@yahoo.com.INVALID> on 2016/08/09 03:35:17 UTC, 3 replies.
- How to get the parameters of bestmodel while using paramgrid and crossvalidator? - posted by colin <co...@sina.cn> on 2016/08/09 06:09:17 UTC, 0 replies.
- coalesce serialising earlier work - posted by Adrian Bridgett <ad...@opensignal.com> on 2016/08/09 07:11:13 UTC, 1 replies.
- saving DF to HDFS in parquet format very slow in SparkSQL app - posted by lu...@sina.com on 2016/08/09 07:34:43 UTC, 0 replies.
- Spark-2.0.0 fails reading a parquet dataset generated by Spark-1.6.2 - posted by immerrr again <im...@gmail.com> on 2016/08/09 08:10:30 UTC, 4 replies.
- RE: bisecting kmeans model tree - posted by "Huang, Qian" <qi...@intel.com> on 2016/08/09 08:24:23 UTC, 0 replies.
- Spark Job Doesn't End on Mesos - posted by Todd Leo <to...@gmail.com> on 2016/08/09 08:28:42 UTC, 1 replies.
- 回复：saving DF to HDFS in parquet format very slow in SparkSQL app - posted by lu...@sina.com on 2016/08/09 09:28:10 UTC, 0 replies.
- Spark Streaming Job Keeps growing memory over time - posted by "aasish.kumar" <aa...@avekshaa.com> on 2016/08/09 10:21:12 UTC, 3 replies.
- Get distinct column data from grouped data - posted by Selvam Raman <se...@gmail.com> on 2016/08/09 10:49:10 UTC, 1 replies.
- [Spark 1.6]-increment value column based on condition + Dataframe - posted by Divya Gehlot <di...@gmail.com> on 2016/08/09 12:34:00 UTC, 0 replies.
- OrientDB through JDBC: query field names wrapped by double quote - posted by Roberto Franchini <ro...@gmail.com> on 2016/08/09 12:36:07 UTC, 0 replies.
- update specifc rows to DB using sqlContext - posted by sujeet jog <su...@gmail.com> on 2016/08/09 12:39:30 UTC, 6 replies.
- Spark 1.6.1 and regexp_replace - posted by Andrés Ivaldi <ia...@gmail.com> on 2016/08/09 16:18:19 UTC, 0 replies.
- DataFrame equivalent to RDD.partionByKey - posted by Stephen Fletcher <st...@gmail.com> on 2016/08/09 16:36:16 UTC, 1 replies.
- Spark 2.0.1 / 2.1.0 on Maven - posted by Jestin Ma <je...@gmail.com> on 2016/08/09 16:55:22 UTC, 7 replies.
- Sparking Water (Spark 1.6.0 + H2O 3.8.2.6 ) on CDH 5.7.1 - posted by RK Aduri <rk...@collectivei.com> on 2016/08/09 17:07:53 UTC, 0 replies.
- Spark on mesos in docker not getting parameters - posted by Jim Carroll <ji...@gmail.com> on 2016/08/09 17:13:18 UTC, 1 replies.
- Unsubscribe. - posted by Martin Somers <so...@gmail.com> on 2016/08/09 19:05:36 UTC, 1 replies.
- spark 2.0 in intellij - posted by Michael Jay <mi...@outlook.com> on 2016/08/09 20:11:12 UTC, 1 replies.
- Spark streaming not processing messages from partitioned topics - posted by Diwakar Dhanuskodi <di...@gmail.com> on 2016/08/09 20:47:25 UTC, 13 replies.
- UNSUBSCRIBE - posted by abhishek singh <ab...@gmail.com> on 2016/08/09 21:14:33 UTC, 7 replies.
- Spark 1.6.2 can read hive tables created with sqoop, but Spark 2.0.0 cannot - posted by cdecleene <cd...@allstate.com> on 2016/08/09 21:32:04 UTC, 4 replies.
- Spark SQL -JDBC connectivity - posted by Soni spark <so...@gmail.com> on 2016/08/10 05:27:04 UTC, 0 replies.
- Re: Spark Thrift Server (Spark 2.0) show table has value with NULL in all fields - posted by Chanh Le <gi...@gmail.com> on 2016/08/10 07:54:52 UTC, 1 replies.
- Please help: Spark job hung/stop writing after exceeding the folder size - posted by Bhupendra Mishra <bh...@gmail.com> on 2016/08/10 09:10:41 UTC, 0 replies.
- Running spark Java on yarn cluster - posted by Atul Phalke <at...@gmail.com> on 2016/08/10 10:25:57 UTC, 2 replies.
- Spark SQL Parallelism - While reading from Oracle - posted by Siva A <si...@gmail.com> on 2016/08/10 10:35:59 UTC, 1 replies.
- suggestion needed on FileInput Path- Spark Streaming - posted by md...@gmail.com on 2016/08/10 13:02:06 UTC, 0 replies.
- Caching an RDD and expected number of partitions in Spark - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/08/10 13:22:05 UTC, 0 replies.
- Use cases around image/video processing in spark - posted by Deepak Sharma <de...@gmail.com> on 2016/08/10 15:20:39 UTC, 0 replies.
- Standardization with Sparse Vectors - posted by Tobi Bosede <an...@gmail.com> on 2016/08/10 15:41:51 UTC, 11 replies.
- Changing Spark configuration midway through application. - posted by Jestin Ma <je...@gmail.com> on 2016/08/10 15:47:31 UTC, 1 replies.
- Simulate serialization when running local - posted by Ashic Mahtab <as...@live.com> on 2016/08/10 17:24:26 UTC, 2 replies.
- Spark2 SBT Assembly - posted by Efe Selcuk <ef...@gmail.com> on 2016/08/10 17:39:46 UTC, 8 replies.
- org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0 - posted by شجاع الرحمن بیگ <sh...@gmail.com> on 2016/08/10 19:34:20 UTC, 0 replies.
- Re: Spark submit job that points to URL of a jar - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/08/10 21:50:50 UTC, 0 replies.
- Is there a reduceByKey functionality in DataFrame API? - posted by luismattor <lu...@gmail.com> on 2016/08/11 00:15:01 UTC, 4 replies.
- na.fill doesn't work - posted by Javier Rey <jr...@gmail.com> on 2016/08/11 00:58:57 UTC, 2 replies.
- groupByKey() compile error after upgrading from 1.6.2 to 2.0.0 - posted by Arun Luthra <ar...@gmail.com> on 2016/08/11 03:07:59 UTC, 2 replies.
- SparkML RandomForest - posted by Pengcheng <pc...@gmail.com> on 2016/08/11 03:42:00 UTC, 0 replies.
- MulticlassClassificationEvaluator use - posted by 陈哲 <cz...@gmail.com> on 2016/08/11 04:00:04 UTC, 0 replies.
- Can't generate model for prediction - posted by Zakaria Hili <za...@gmail.com> on 2016/08/11 08:18:39 UTC, 1 replies.
- Spark 2.0.0 - Java API - Modify a column in a dataframe - posted by Aseem Bansal <as...@gmail.com> on 2016/08/11 12:28:08 UTC, 1 replies.
- Re: Spark excludes "fastutil" dependencies we need - posted by cryptoe <ka...@gmail.com> on 2016/08/11 12:59:58 UTC, 0 replies.
- dataframe row list question - posted by vr spark <vr...@gmail.com> on 2016/08/11 14:54:54 UTC, 2 replies.
- Why training data in Kmeans Spark streaming clustering - posted by Ahmed Sadek <do...@gmail.com> on 2016/08/11 16:14:47 UTC, 1 replies.
- Re: HiveThriftServer and spark.sql.hive.thriftServer.singleSession setting - posted by Richard M <ri...@gmail.com> on 2016/08/11 16:23:39 UTC, 2 replies.
- Re: Table registered using registerTempTable not found in HiveContext - posted by Richard M <ri...@gmail.com> on 2016/08/11 16:27:21 UTC, 1 replies.
- Spark 1.6.2 HiveServer2 cannot access temp tables - posted by Richard M <ri...@gmail.com> on 2016/08/11 16:30:18 UTC, 0 replies.
- Spark 2 cannot create ORC table when CLUSTERED. This worked in Spark 1.6.1 - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/08/11 17:02:24 UTC, 3 replies.
- Single point of failure with Driver host crashing - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/08/11 19:40:53 UTC, 3 replies.
- DataFramesWriter saving DataFrames timestamp in weird format - posted by Jestin Ma <je...@gmail.com> on 2016/08/11 21:04:30 UTC, 1 replies.
- Losing executors due to memory problems - posted by "Muttineni, Vinay" <vm...@ebay.com> on 2016/08/11 23:41:27 UTC, 2 replies.
- Log messages for shuffle phase - posted by Suman Somasundar <su...@oracle.com> on 2016/08/12 01:28:06 UTC, 1 replies.
- type inference csv dates - posted by Koert Kuipers <ko...@tresata.com> on 2016/08/12 02:07:00 UTC, 0 replies.
- KafkaUtils.createStream not picking smallest offset - posted by Diwakar Dhanuskodi <di...@gmail.com> on 2016/08/12 08:35:02 UTC, 0 replies.
- Spark's Logistic Regression runs unstable on Yarn cluster - posted by olivierjeunen <ol...@gmail.com> on 2016/08/12 10:08:55 UTC, 1 replies.
- [Spark 2.0] spark.sql.hive.metastore.jars doesn't work - posted by "颜发才 (Yan Facai)" <ya...@gmail.com> on 2016/08/12 10:28:14 UTC, 0 replies.
- PySpark read from HBase - posted by Bin Wang <bi...@gmail.com> on 2016/08/12 15:18:13 UTC, 0 replies.
- Grid Search using Spark MLLib Pipelines - posted by Adamantios Corais <ad...@gmail.com> on 2016/08/12 16:17:19 UTC, 2 replies.
- How to add custom steps to Pipeline models? - posted by evanzamir <za...@gmail.com> on 2016/08/12 16:19:42 UTC, 3 replies.
- Mailing list - posted by Inam Ur Rehman <in...@gmail.com> on 2016/08/12 17:46:17 UTC, 0 replies.
- Unable to run spark examples in eclipse - posted by subash basnet <ya...@gmail.com> on 2016/08/12 17:49:39 UTC, 4 replies.
- countDistinct, partial aggregates and Spark 2.0 - posted by Lee Becker <le...@hapara.com> on 2016/08/12 17:55:14 UTC, 1 replies.
- Re: KafkaUtils.createStream not picking smallest offset - posted by Cody Koeninger <co...@koeninger.org> on 2016/08/12 18:12:17 UTC, 1 replies.
- Re: Rebalancing when adding kafka partitions - posted by Srikanth <sr...@gmail.com> on 2016/08/12 19:47:28 UTC, 3 replies.
- Spark 2.0.0 JaninoRuntimeException - posted by Aris <ar...@gmail.com> on 2016/08/12 20:33:26 UTC, 9 replies.
- Using spark package XGBoost - posted by janardhan shetty <ja...@gmail.com> on 2016/08/12 22:35:12 UTC, 4 replies.
- restart spark streaming app - posted by Shifeng Xiao <xi...@gmail.com> on 2016/08/12 23:38:25 UTC, 1 replies.
- Flattening XML in a DataFrame - posted by Sreekanth Jella <sr...@gmail.com> on 2016/08/13 00:33:48 UTC, 5 replies.
- Why I can't use broadcast var defined in a global object? - posted by yaochunnan <ya...@gmail.com> on 2016/08/13 00:40:19 UTC, 3 replies.
- Accessing SparkConfig from mapWithState function - posted by "Govindasamy, Nagarajan" <ng...@turbine.com> on 2016/08/13 01:15:22 UTC, 0 replies.
- Spark Streaming fault tolerance benchmark - posted by Dominik Safaric <do...@gmail.com> on 2016/08/13 14:50:12 UTC, 0 replies.
- Spark stage concurrency - posted by Mazen <ma...@gmail.com> on 2016/08/13 15:17:44 UTC, 0 replies.
- call a mysql stored procedure from spark - posted by sujeet jog <su...@gmail.com> on 2016/08/13 17:40:03 UTC, 6 replies.
- mesos or kubernetes ? - posted by guyoh <g1...@gmail.com> on 2016/08/13 18:24:22 UTC, 6 replies.
- [SQL] Why does (0 to 9).toDF("num").as[String] work? - posted by Jacek Laskowski <ja...@japila.pl> on 2016/08/13 20:17:54 UTC, 4 replies.
- Does Spark SQL support indexes? - posted by "Taotao.Li" <ch...@gmail.com> on 2016/08/14 03:03:51 UTC, 10 replies.
- How Spark sql query optimisation work if we are using .rdd action ? - posted by mayur bhole <ma...@gmail.com> on 2016/08/14 05:04:10 UTC, 2 replies.
- - posted by Jestin Ma <je...@gmail.com> on 2016/08/14 06:15:43 UTC, 8 replies.
- Issue with compiling Scala with Spark 2 - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/08/14 15:58:23 UTC, 11 replies.
- parallel processing with JDBC - posted by Ashok Kumar <as...@yahoo.com.INVALID> on 2016/08/14 19:50:08 UTC, 12 replies.
- Re: Role-based S3 access outside of EMR - posted by Steve Loughran <st...@hortonworks.com> on 2016/08/14 20:15:24 UTC, 0 replies.
- spark ml : auc on extreme distributed data - posted by Zhiliang Zhu <zc...@yahoo.com.INVALID> on 2016/08/15 04:11:39 UTC, 1 replies.
- Can not find usage of classTag variable defined in abstract class AtomicType in spark project - posted by Andy Zhao <an...@gmail.com> on 2016/08/15 08:30:26 UTC, 0 replies.
- Linear regression, weights constraint - posted by letaiv <tl...@gmail.com> on 2016/08/15 10:53:03 UTC, 2 replies.
- Submitting jobs to YARN from outside EMR -- config & S3 impl - posted by Everett Anderson <ev...@nuna.com.INVALID> on 2016/08/15 16:20:52 UTC, 0 replies.
- class not found exception Logging while running JavaKMeansExample - posted by subash basnet <ya...@gmail.com> on 2016/08/15 16:48:24 UTC, 4 replies.
- Sum array values by row in new column - posted by Javier Rey <jr...@gmail.com> on 2016/08/15 17:02:27 UTC, 3 replies.
- how to do nested loops over 2 arrays but use Two RDDs instead ? - posted by Eric Ho <er...@analyticsmd.com> on 2016/08/15 18:12:19 UTC, 1 replies.
- Number of tasks on executors become negative after executor failures - posted by Rachana Srivastava <Ra...@markmonitor.com> on 2016/08/15 19:13:58 UTC, 1 replies.
- GraphX build from JSON input - posted by Gerard Casey <ge...@gmail.com> on 2016/08/15 20:00:49 UTC, 1 replies.
- How to do nested for-each loops across RDDs ? - posted by Eric Ho <er...@analyticsmd.com> on 2016/08/15 20:15:30 UTC, 2 replies.
- Spark 2.0.0 OOM error at beginning of RDD map on AWS - posted by Arun Luthra <ar...@gmail.com> on 2016/08/15 21:12:11 UTC, 3 replies.
- read kafka offset from spark checkpoint - posted by Shifeng Xiao <xi...@gmail.com> on 2016/08/15 21:14:00 UTC, 1 replies.
- [ANNOUNCE] Apache Bahir 2.0.0 - posted by Luciano Resende <lr...@apache.org> on 2016/08/15 21:19:17 UTC, 3 replies.
- SizeEstimator for python - posted by Maurin Lenglart <ma...@cuberonlabs.com> on 2016/08/15 22:09:49 UTC, 0 replies.
- Spark Yarn executor container memory - posted by Lan Jiang <lj...@gmail.com> on 2016/08/16 02:26:04 UTC, 1 replies.
- Re: Apache Spark toDebugString producing different output for python and scala repl - posted by DEEPAK SHARMA <de...@outlook.com> on 2016/08/16 02:58:14 UTC, 1 replies.
- MLIB and R results do not match for SVD - posted by roni <ro...@gmail.com> on 2016/08/16 07:08:58 UTC, 0 replies.
- java.lang.UnsupportedOperationException: Cannot evaluate expression: fun_nm(input[0, string, true]) - posted by pseudo oduesp <ps...@gmail.com> on 2016/08/16 08:50:01 UTC, 2 replies.
- Data frame Performance - posted by Selvam Raman <se...@gmail.com> on 2016/08/16 11:06:12 UTC, 2 replies.
- Spark Executor Metrics - posted by Muhammad Haris <mu...@gmail.com> on 2016/08/16 11:48:03 UTC, 2 replies.
- long lineage - posted by pseudo oduesp <ps...@gmail.com> on 2016/08/16 12:50:34 UTC, 1 replies.
- pyspark 1.5.0 after three joins --> stackoverflow - posted by pseudo oduesp <ps...@gmail.com> on 2016/08/16 13:06:39 UTC, 0 replies.
- GraphFrames 0.2.0 released - posted by Tim Hunter <ti...@databricks.com> on 2016/08/16 16:32:55 UTC, 3 replies.
- VectorUDT with spark.ml.linalg.Vector - posted by alexeys <al...@princeton.edu> on 2016/08/16 16:48:07 UTC, 4 replies.
- DataFrame use case - posted by jtgenesis <jt...@gmail.com> on 2016/08/16 17:32:50 UTC, 1 replies.
- Large where clause StackOverflow 1.5.2 - posted by rachmaninovquartet <ra...@gmail.com> on 2016/08/16 18:45:48 UTC, 1 replies.
- Anyone else having trouble with replicated off heap RDD persistence? - posted by Michael Allman <mi...@videoamp.com> on 2016/08/16 22:45:14 UTC, 1 replies.
- Can't connect to remote spark standalone cluster: getting WARN TaskSchedulerImpl: Initial job has not accepted any resources - posted by Andrew Vykhodtsev <yo...@gmail.com> on 2016/08/16 23:24:16 UTC, 0 replies.
- SPARK MLLib - How to tie back Model.predict output to original data? - posted by ayan guha <gu...@gmail.com> on 2016/08/17 00:48:37 UTC, 5 replies.
- [SQL] Why does spark.read.csv.cache give me a WARN about cache but not text?! - posted by Jacek Laskowski <ja...@japila.pl> on 2016/08/17 01:05:17 UTC, 2 replies.
- create SparkSession without loading defaults for unit tests - posted by Koert Kuipers <ko...@tresata.com> on 2016/08/17 02:23:20 UTC, 0 replies.
- JavaRDD to DataFrame fails with null pointer exception in 1.6.0 - posted by spats <sp...@gmail.com> on 2016/08/17 04:28:55 UTC, 2 replies.
- UDF in SparkR - posted by Yogesh Vyas <in...@gmail.com> on 2016/08/17 06:38:53 UTC, 2 replies.
- Spark MLlib question: load model failed with exception:org.json4s.package$MappingException: Did not find value which can be converted into java.lang.String - posted by lu...@sina.com on 2016/08/17 08:30:14 UTC, 1 replies.
- Spark standalone or Yarn for resourcing - posted by Ashok Kumar <as...@yahoo.com.INVALID> on 2016/08/17 08:35:50 UTC, 0 replies.
- How to implement a InputDStream like the twitter stream in Spark? - posted by Xi Shen <da...@gmail.com> on 2016/08/17 09:39:13 UTC, 0 replies.
- Aggregations with scala pairs - posted by Andrés Ivaldi <ia...@gmail.com> on 2016/08/17 14:01:24 UTC, 3 replies.
- Undefined function json_array_to_map - posted by vr spark <vr...@gmail.com> on 2016/08/17 15:46:20 UTC, 2 replies.
- Attempting to accept an unknown offer - posted by vr spark <vr...@gmail.com> on 2016/08/17 15:46:30 UTC, 4 replies.
- pyspark.sql.functions.last not working as expected - posted by Alexander Peletz <al...@slalom.com> on 2016/08/17 15:56:32 UTC, 2 replies.
- Re: Spark DF CacheTable method. Will it save data to disk? - posted by neil90 <ne...@icloud.com> on 2016/08/17 15:56:48 UTC, 1 replies.
- [Community] Python support added to Spark Job Server - posted by Evan Chan <ve...@gmail.com> on 2016/08/17 17:04:03 UTC, 1 replies.
- Spark ML : One hot Encoding for multiple columns - posted by janardhan shetty <ja...@gmail.com> on 2016/08/17 17:18:00 UTC, 4 replies.
- Extract year from string format of date - posted by Selvam Raman <se...@gmail.com> on 2016/08/17 17:18:20 UTC, 0 replies.
- error when running spark from oozie launcher - posted by tkg_cangkul <yu...@gmail.com> on 2016/08/17 17:24:08 UTC, 4 replies.
- How to combine two DStreams(pyspark)? - posted by vidhan <vi...@kitboard.co> on 2016/08/17 20:40:00 UTC, 1 replies.
- Spark SQL 1.6.1 issue - posted by thbeh <th...@thbeh.com> on 2016/08/18 04:05:10 UTC, 2 replies.
- spark streaming Directkafka with checkpointing : changed parameters not considered - posted by chandan prakash <ch...@gmail.com> on 2016/08/18 07:40:01 UTC, 6 replies.
- Converting Dataframe to resultSet in Spark Java - posted by Sree Eedupuganti <sr...@inndata.in> on 2016/08/18 07:56:52 UTC, 1 replies.
- How to Improve Random Forest classifier accuracy - posted by 陈哲 <cz...@gmail.com> on 2016/08/18 08:25:35 UTC, 2 replies.
- GraphX VerticesRDD issue - java.lang.ArrayStoreException: java.lang.Long - posted by Gerard Casey <ge...@gmail.com> on 2016/08/18 08:53:04 UTC, 0 replies.
- 2.0.1/2.1.x release dates - posted by Adrian Bridgett <ad...@opensignal.com> on 2016/08/18 09:35:30 UTC, 2 replies.
- [Spark 2.0] ClassNotFoundException is thrown when using Hive - posted by "颜发才 (Yan Facai)" <ya...@gmail.com> on 2016/08/18 09:47:42 UTC, 3 replies.
- Spark Streaming application failing with Token issue - posted by Kamesh <ka...@gmail.com> on 2016/08/18 09:51:27 UTC, 6 replies.
- SparkStreaming source code - posted by Aditya <ad...@augmentiq.co.in> on 2016/08/18 12:30:58 UTC, 1 replies.
- Reporting errors from spark sql - posted by yael aharon <ya...@gmail.com> on 2016/08/18 13:14:09 UTC, 1 replies.
- Standalone executor memory is fixed while executor cores are load balanced between workers - posted by Petr Novak <os...@gmail.com> on 2016/08/18 14:06:21 UTC, 1 replies.
- createDirectStream parallelism - posted by Diwakar Dhanuskodi <di...@gmail.com> on 2016/08/18 14:22:45 UTC, 0 replies.
- Structured Stream Behavior on failure - posted by Cornelio <co...@gmail.com> on 2016/08/18 15:37:17 UTC, 0 replies.
- Model Persistence - posted by Rich Tarro <ri...@gmail.com> on 2016/08/18 16:00:36 UTC, 1 replies.
- Extra string added to column name? (withColumn & expr) - posted by rachmaninovquartet <ra...@gmail.com> on 2016/08/18 16:57:38 UTC, 0 replies.
- py4j.Py4JException: Method lower([class java.lang.String]) does not exist - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/08/18 17:07:51 UTC, 0 replies.
- Py4JJavaError: An error occurred while calling None.org.apache.spark.sql.hive.HiveContext. - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/08/18 18:34:37 UTC, 0 replies.
- [Spark2] Error writing "complex" type to CSV - posted by Efe Selcuk <ef...@gmail.com> on 2016/08/18 21:32:35 UTC, 6 replies.
- pyspark unable to create UDF: java.lang.RuntimeException: org.apache.hadoop.fs.FileAlreadyExistsException: Parent path is not a directory: /tmp tmp - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/08/18 21:56:16 UTC, 3 replies.
- Unable to see external table that is created from Hive Context in the list of hive tables - posted by SRK <sw...@gmail.com> on 2016/08/18 22:08:57 UTC, 1 replies.
- Spark streaming - posted by Diwakar Dhanuskodi <di...@gmail.com> on 2016/08/19 02:04:51 UTC, 0 replies.
- How to continuous update or refresh RandomForestClassificationModel - posted by 陈哲 <cz...@gmail.com> on 2016/08/19 08:21:00 UTC, 1 replies.
- Spark streaming 2, giving error ClassNotFoundException: scala.collection.GenTraversableOnce$class - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/08/19 18:24:20 UTC, 2 replies.
- Best way to read XML data from RDD - posted by Diwakar Dhanuskodi <di...@gmail.com> on 2016/08/19 20:07:15 UTC, 13 replies.
- Re: Spark SQL concurrent runs fails with java.util.concurrent.TimeoutException: Futures timed out after [300 seconds] - posted by Davies Liu <da...@databricks.com> on 2016/08/19 21:08:46 UTC, 0 replies.
- "Schemaless" Spark - posted by Efe Selcuk <ef...@gmail.com> on 2016/08/19 21:54:23 UTC, 1 replies.
- Spark 2.0 regression when querying very wide data frames - posted by mhornbech <mo...@datasolvr.com> on 2016/08/19 23:16:37 UTC, 7 replies.
- Re: How Spark HA works - posted by Charles Nnamdi Akalugwu <cp...@gmail.com> on 2016/08/20 03:56:08 UTC, 1 replies.
- mutable.LinkedHashMap kryo serialization issues - posted by Rahul Palamuttam <ra...@gmail.com> on 2016/08/21 01:20:13 UTC, 4 replies.
- DCOS - s3 - posted by Martin Somers <so...@gmail.com> on 2016/08/21 09:19:43 UTC, 0 replies.
- Dataframe corrupted when sqlContext.read.json on a Gzipped file that contains more than one file - posted by Chua Jie Sheng <ch...@gmail.com> on 2016/08/21 11:51:38 UTC, 1 replies.
- Entire XML data as one of the column in DataFrame - posted by sr...@gmail.com on 2016/08/21 20:31:41 UTC, 1 replies.
- Vector size mismatch in logistic regression - Spark ML 2.0 - posted by janardhan shetty <ja...@gmail.com> on 2016/08/21 22:16:15 UTC, 5 replies.
- Hi, - posted by Xi Shen <da...@gmail.com> on 2016/08/22 00:55:13 UTC, 0 replies.
- Populating tables using hive and spark - posted by Nitin Kumar <nk...@gmail.com> on 2016/08/22 07:34:21 UTC, 3 replies.
- updateStateByKey for window batching - posted by Dávid Szakállas <da...@risingstack.com> on 2016/08/22 10:49:41 UTC, 0 replies.
- Fwd: How to avoid RDD shuffling in join after Distributed Matrix calculation - posted by Tharindu Thundeniya <th...@gmail.com> on 2016/08/22 12:14:03 UTC, 0 replies.
- Avoid RDD shuffling in a join after Distributed Matrix operation - posted by Tharindu <th...@gmail.com> on 2016/08/22 12:18:52 UTC, 0 replies.
- Disable logger in SparkR - posted by Yogesh Vyas <in...@gmail.com> on 2016/08/22 13:12:10 UTC, 1 replies.
- Using spark to distribute jobs to standalone servers - posted by Larry White <lj...@gmail.com> on 2016/08/22 14:59:41 UTC, 3 replies.
- UDTRegistration (in Java) - posted by raghukiran <ra...@gmail.com> on 2016/08/22 15:10:06 UTC, 1 replies.
- [pyspark] How to ensure rdd.takeSample produce the same set everytime? - posted by Chua Jie Sheng <ch...@gmail.com> on 2016/08/22 16:24:04 UTC, 0 replies.
- Do we still need to use Kryo serializer in Spark 1.6.2 ? - posted by Eric Ho <er...@analyticsmd.com> on 2016/08/22 18:00:41 UTC, 2 replies.
- spark.lapply in SparkR: Error in writeBin(batch, con, endian = "big") - posted by "Cinquegrana, Piero" <Pi...@neustar.biz> on 2016/08/22 18:58:05 UTC, 6 replies.
- Spark 2.0 - Join statement compile error - posted by Subhajit Purkayastha <sp...@p3si.net> on 2016/08/22 21:00:44 UTC, 9 replies.
- word count on parquet file - posted by shamu <pr...@hotmail.com> on 2016/08/22 21:12:21 UTC, 2 replies.
- Combining multiple models in Spark-ML 2.0 - posted by janardhan shetty <ja...@gmail.com> on 2016/08/22 22:40:19 UTC, 1 replies.
- DataFrameWriter bug after RandomSplit? - posted by evanzamir <za...@gmail.com> on 2016/08/22 22:43:29 UTC, 0 replies.
- Re: Spark with Parquet - posted by shamu <pr...@hotmail.com> on 2016/08/23 01:25:24 UTC, 1 replies.
- Apply ML to grouped dataframe - posted by Wen Pei Yu <yu...@cn.ibm.com> on 2016/08/23 02:04:33 UTC, 8 replies.
- Log rollover in spark streaming jobs - posted by Pradeep <pr...@mail.com> on 2016/08/23 10:44:29 UTC, 1 replies.
- Can one create a dynamic Array and convert it to DF - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/08/23 11:51:48 UTC, 1 replies.
- question about Broadcast value NullPointerException - posted by Chong Zhang <ch...@gmail.com> on 2016/08/23 12:49:17 UTC, 0 replies.
- Things to do learn Cassandra in Apache Spark Environment - posted by Gokula Krishnan D <em...@gmail.com> on 2016/08/23 15:28:29 UTC, 0 replies.
- Zero Data Loss in Spark with Kafka - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2016/08/23 15:30:48 UTC, 2 replies.
- Is "spark streaming" streaming or mini-batch? - posted by Aseem Bansal <as...@gmail.com> on 2016/08/23 15:41:44 UTC, 7 replies.
- Breaking down text String into Array elements - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/08/23 17:42:30 UTC, 5 replies.
- Are RDD's ever persisted to disk? - posted by kant kodali <ka...@gmail.com> on 2016/08/23 18:12:51 UTC, 17 replies.
- Maelstrom: Kafka integration with Spark - posted by Jeoffrey Lim <je...@gmail.com> on 2016/08/23 19:19:54 UTC, 5 replies.
- Spark 2.0 with Kafka 0.10 exception - posted by Srikanth <sr...@gmail.com> on 2016/08/23 20:44:17 UTC, 2 replies.
- spark-jdbc impala with kerberos using yarn-client - posted by twisterius <gr...@gmail.com> on 2016/08/23 21:12:48 UTC, 1 replies.
- How do we process/scale variable size batches in Apache Spark Streaming - posted by Rachana Srivastava <Ra...@markmonitor.com> on 2016/08/23 22:20:53 UTC, 0 replies.
- DataFrame Data Manipulation - Based on a timestamp column Not Working - posted by Subhajit Purkayastha <sp...@p3si.net> on 2016/08/23 22:46:29 UTC, 1 replies.
- dynamic allocation in Spark 2.0 - posted by Shane Lee <sh...@yahoo.com.INVALID> on 2016/08/24 07:16:51 UTC, 1 replies.
- Future of GraphX - posted by mas <ma...@gmail.com> on 2016/08/24 07:46:11 UTC, 0 replies.
- Spark MLlib:Collaborative Filtering - posted by "Devi P.V" <de...@gmail.com> on 2016/08/24 08:28:42 UTC, 3 replies.
- Best range of parameters for grid search? - posted by Adamantios Corais <ad...@gmail.com> on 2016/08/24 09:26:29 UTC, 0 replies.
- work with russian letters - posted by AlexModestov <Al...@gmail.com> on 2016/08/24 09:37:56 UTC, 1 replies.
- Can we redirect Spark shuffle spill data to HDFS or Alluxio? - posted by "tony.yan@tendcloud.com" <to...@tendcloud.com> on 2016/08/24 12:11:39 UTC, 7 replies.
- Dataframe write to DB , loosing primary key index & data types. - posted by sujeet jog <su...@gmail.com> on 2016/08/24 14:37:01 UTC, 1 replies.
- Best way to calculate intermediate column statistics - posted by Richard Siebeling <rs...@gmail.com> on 2016/08/24 14:42:26 UTC, 8 replies.
- Re: a question about LBFGS in Spark - posted by DB Tsai <db...@dbtsai.com> on 2016/08/24 18:48:12 UTC, 0 replies.
- Fwd: quick question - posted by kant kodali <ka...@gmail.com> on 2016/08/24 20:52:56 UTC, 10 replies.
- What do I loose if I run spark without using HDFS or Zookeeper? - posted by kant kodali <ka...@gmail.com> on 2016/08/24 20:54:42 UTC, 22 replies.
- Sqoop vs spark jdbc - posted by Venkata Penikalapati <ma...@gmail.com> on 2016/08/24 21:39:01 UTC, 8 replies.
- How to compute a net (difference) given a bi-directional stream of numbers using spark streaming? - posted by kant kodali <ka...@gmail.com> on 2016/08/24 21:48:04 UTC, 0 replies.
- Incremental Updates and custom SQL via JDBC - posted by Oldskoola <sa...@outlook.com> on 2016/08/24 23:08:50 UTC, 3 replies.
- Spark Streaming user function exceptions causing hangs - posted by N B <nb...@gmail.com> on 2016/08/25 01:07:25 UTC, 0 replies.
- Spark Logging : log4j.properties or log4j.xml - posted by John Jacobs <ge...@gmail.com> on 2016/08/25 04:23:06 UTC, 0 replies.
- spark 2.0.0 - when saving a model to S3 spark creates temporary files. Why? - posted by Aseem Bansal <as...@gmail.com> on 2016/08/25 05:21:17 UTC, 5 replies.
- Is there anyway Spark UI is set to poll and refreshes itself - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/08/25 09:55:06 UTC, 9 replies.
- UDF on lpad - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/08/25 11:06:56 UTC, 4 replies.
- Latest Release of Receiver based Kafka Consumer for Spark Streaming. - posted by Dibyendu Bhattacharya <di...@gmail.com> on 2016/08/25 11:33:41 UTC, 2 replies.
- Kafka message metadata with Dstreams - posted by Pradeep <pr...@mail.com> on 2016/08/25 11:45:11 UTC, 1 replies.
- Re: namespace quota not take effect - posted by Ted Yu <yu...@gmail.com> on 2016/08/25 12:46:03 UTC, 0 replies.
- Pyspark SQL 1.6.0 write problem - posted by Ethan Aubin <et...@gmail.com> on 2016/08/25 15:00:28 UTC, 0 replies.
- SparkStreaming + Flume: org.jboss.netty.channel.ChannelException: Failed to bind to: master60/10.0.10.60:31001 - posted by lu...@sina.com on 2016/08/25 15:21:06 UTC, 0 replies.
- Perform an ALS with TF-IDF output (spark 2.0) - posted by Pasquinell Urbani <pa...@exalitica.com> on 2016/08/25 16:22:21 UTC, 0 replies.
- Caching broadcasted DataFrames? - posted by Jestin Ma <je...@gmail.com> on 2016/08/25 17:07:25 UTC, 1 replies.
- How to output RDD to one file with specific name? - posted by Gavin Yue <yu...@gmail.com> on 2016/08/25 17:15:01 UTC, 1 replies.
- Running yarn with spark not working with Java 8 - posted by Anil Langote <an...@gmail.com> on 2016/08/25 19:15:57 UTC, 0 replies.
- Insert non-null values from dataframe - posted by Selvam Raman <se...@gmail.com> on 2016/08/25 20:23:17 UTC, 2 replies.
- Please assist: Building Docker image containing spark 2.0 - posted by Marco Mistroni <mm...@gmail.com> on 2016/08/25 20:31:52 UTC, 9 replies.
- Converting DataFrame's int column to Double - posted by Marco Mistroni <mm...@gmail.com> on 2016/08/25 21:09:33 UTC, 2 replies.
- How to do this pairing in Spark? - posted by Rex X <dn...@gmail.com> on 2016/08/26 01:00:26 UTC, 3 replies.
- How to install spark with s3 on AWS? - posted by kant kodali <ka...@gmail.com> on 2016/08/26 11:46:26 UTC, 2 replies.
- How to make new composite columns by combining rows in the same group? - posted by Rex X <dn...@gmail.com> on 2016/08/26 11:54:36 UTC, 2 replies.
- unable to start slaves from master (SSH problem) - posted by kant kodali <ka...@gmail.com> on 2016/08/26 12:32:04 UTC, 1 replies.
- zookeeper mesos logging in spark - posted by aecc <al...@gmail.com> on 2016/08/26 15:31:30 UTC, 1 replies.
- Reading parquet files into Spark Streaming - posted by Renato Marroquín Mogrovejo <re...@gmail.com> on 2016/08/26 15:42:00 UTC, 6 replies.
- Spark driver memory breakdown - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/08/26 16:48:35 UTC, 0 replies.
- Spark 2.0 - Insert/Update to a DataFrame - posted by Subhajit Purkayastha <sp...@p3si.net> on 2016/08/26 16:53:14 UTC, 5 replies.
- spark 2.0 home brew package missing - posted by kalkimann <ka...@gmail.com> on 2016/08/26 17:25:52 UTC, 2 replies.
- EMR for spark job - instance type suggestion - posted by "Saurabh Malviya (samalviy)" <sa...@cisco.com> on 2016/08/26 17:29:28 UTC, 1 replies.
- is there a HTTP2 (v2) endpoint for Spark Streaming? - posted by kant kodali <ka...@gmail.com> on 2016/08/26 19:42:08 UTC, 3 replies.
- Spark 1.6 Streaming with Checkpointing - posted by Benjamin Kim <bb...@gmail.com> on 2016/08/26 20:54:34 UTC, 1 replies.
- Dynamically change executors settings - posted by Vadim Semenov <va...@datadoghq.com> on 2016/08/27 02:40:02 UTC, 1 replies.
- Issues with Spark On Hbase Connector and versions - posted by spats <sp...@gmail.com> on 2016/08/27 11:47:07 UTC, 2 replies.
- Write parquet file from Spark Streaming - posted by Kevin Tran <ke...@gmail.com> on 2016/08/28 01:32:46 UTC, 0 replies.
- How to persist SparkContext? - posted by "Taotao.Li" <ch...@gmail.com> on 2016/08/28 03:32:49 UTC, 2 replies.
- Equivalent of "predict" function from LogisticRegressionWithLBFGS in OneVsRest with LogisticRegression classifier (Spark 2.0) - posted by yaroslav <ya...@gmail.com> on 2016/08/28 09:06:24 UTC, 1 replies.
- UDF/UDAF performance - posted by AssafMendelson <as...@rsa.com> on 2016/08/28 09:53:56 UTC, 0 replies.
- Spark StringType could hold how many characters ? - posted by Kevin Tran <ke...@gmail.com> on 2016/08/28 14:24:16 UTC, 1 replies.
- Best practises to storing data in Parquet files - posted by Kevin Tran <ke...@gmail.com> on 2016/08/28 14:43:51 UTC, 4 replies.
- Design patterns involving Spark - posted by Ashok Kumar <as...@yahoo.com.INVALID> on 2016/08/28 17:04:29 UTC, 13 replies.
- Suggestions for calculating MAU/WAU/DAU - posted by Tal Grynbaum <ta...@gmail.com> on 2016/08/28 18:48:26 UTC, 0 replies.
- S3A + EMR failure when writing Parquet? - posted by Everett Anderson <ev...@nuna.com.INVALID> on 2016/08/28 19:51:50 UTC, 3 replies.
- Automating lengthy command to pyspark with configuration? - posted by Russell Jurney <ru...@gmail.com> on 2016/08/28 22:30:20 UTC, 2 replies.
- Issue with Spark HBase connector streamBulkGet method - posted by BiksN <bi...@gmail.com> on 2016/08/29 03:57:13 UTC, 0 replies.
- How can we connect RDD from previous job to next job - posted by Sachin Mittal <sj...@gmail.com> on 2016/08/29 04:30:33 UTC, 5 replies.
- json with millisecond timestamp in spark 2 - posted by filousen <fi...@hotmail.com> on 2016/08/29 07:05:56 UTC, 0 replies.
- can I use cassandra for checkpointing during a spark streaming job - posted by kant kodali <ka...@gmail.com> on 2016/08/29 07:14:47 UTC, 0 replies.
- How to acess the WrappedArray - posted by Sree Eedupuganti <sr...@inndata.in> on 2016/08/29 09:27:07 UTC, 3 replies.
- After calling persist, why the size in sparkui is not matching with the actual file size - posted by Rohit Kumar Prusty <Ro...@infosys.com> on 2016/08/29 13:52:30 UTC, 2 replies.
- Spark Streaming batch sequence number - posted by Matt Smith <ma...@gmail.com> on 2016/08/29 15:18:42 UTC, 0 replies.
- Exception during creation of ActorReceiver when running ActorWordCount on CDH 5.5.2 - posted by Ricky Pritchett <or...@yahoo.com.INVALID> on 2016/08/29 15:35:42 UTC, 0 replies.
- Coding in the Spark ml "ecosystem" why is everything private?! - posted by Thunder Stumpges <th...@gmail.com> on 2016/08/29 16:46:19 UTC, 2 replies.
- Great performance improvement of Spark 1.6.2 on our production cluster - posted by Yong Zhang <ja...@hotmail.com> on 2016/08/29 17:16:32 UTC, 0 replies.
- Cleanup after Spark SQL job with window aggregation takes a long time - posted by Jestin Ma <je...@gmail.com> on 2016/08/29 22:57:55 UTC, 0 replies.
- Spark launcher handle and listener not giving state - posted by ckanth99 <ck...@zoho.com> on 2016/08/29 23:36:31 UTC, 2 replies.
- java.lang.RuntimeException: java.lang.AssertionError: assertion failed: A ReceiverSupervisor has not been attached to the receiver yet. - posted by kant kodali <ka...@gmail.com> on 2016/08/29 23:49:23 UTC, 0 replies.
- How to attach a ReceiverSupervisor for a Custom receiver in Spark Streaming? - posted by kant kodali <ka...@gmail.com> on 2016/08/30 00:18:21 UTC, 0 replies.
- Spark 2.0.0 - What all access is needed to save model to S3? - posted by Aseem Bansal <as...@gmail.com> on 2016/08/30 05:20:45 UTC, 1 replies.
- How to use custom class in DataSet - posted by canan chen <cc...@gmail.com> on 2016/08/30 05:39:13 UTC, 1 replies.
- Spark metrics when running with YARN? - posted by Otis Gospodnetić <ot...@gmail.com> on 2016/08/30 05:53:54 UTC, 5 replies.
- How to convert List into json object / json Array - posted by Sree Eedupuganti <sr...@inndata.in> on 2016/08/30 07:06:58 UTC, 2 replies.
- Writing to Hbase table from Spark - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/08/30 09:13:18 UTC, 2 replies.
- ApplicationMaster + Fair Scheduler + Dynamic resource allocation - posted by Cleosson José Pirani de Souza <cs...@daitangroup.com> on 2016/08/30 11:30:39 UTC, 0 replies.
- broadcast fails on join - posted by AssafMendelson <as...@rsa.com> on 2016/08/30 12:09:11 UTC, 1 replies.
- Hi, guys, does anyone use Spark in finance market? - posted by "Taotao.Li" <ch...@gmail.com> on 2016/08/30 13:13:44 UTC, 0 replies.
- 回复：ApplicationMaster + Fair Scheduler + Dynamic resource allocation - posted by 梅西0247 <zh...@dtdream.com> on 2016/08/30 13:21:35 UTC, 0 replies.
- Controlling access to hive/db-tables while using SparkSQL - posted by "Rajani, Arpan" <Ar...@WorldPay.com> on 2016/08/30 14:22:27 UTC, 3 replies.
- ApacheCon Seville CFP closes September 9th - posted by Rich Bowen <rb...@apache.org> on 2016/08/30 15:03:41 UTC, 0 replies.
- Re: Random Forest Classification - posted by Bahubali Jain <ba...@gmail.com> on 2016/08/30 16:57:09 UTC, 3 replies.
- Does Spark on YARN inherit or replace the Hadoop/YARN configs? - posted by Everett Anderson <ev...@nuna.com.INVALID> on 2016/08/30 17:38:32 UTC, 0 replies.
- How to convert an ArrayType to DenseVector within DataFrame? - posted by evanzamir <za...@gmail.com> on 2016/08/30 18:45:26 UTC, 0 replies.
- newlines inside csv quoted values - posted by Koert Kuipers <ko...@tresata.com> on 2016/08/30 19:40:28 UTC, 0 replies.
- Spark build 1.6.2 error - posted by Diwakar Dhanuskodi <di...@gmail.com> on 2016/08/30 20:30:43 UTC, 2 replies.
- Re: Dynamic Allocation & Spark Streaming - posted by Liren Ding <sk...@gmail.com> on 2016/08/30 20:43:19 UTC, 0 replies.
- Model abstract class in spark ml - posted by Mohit Jaggi <mo...@gmail.com> on 2016/08/30 20:47:34 UTC, 7 replies.
- reuse the Spark SQL internal metrics - posted by Ai Deng <wx...@gmail.com> on 2016/08/30 21:17:54 UTC, 1 replies.
- Best way to share state in a streaming cluster - posted by "C. Josephson" <cj...@uhana.io> on 2016/08/31 00:57:57 UTC, 0 replies.
- Iterative mapWithState - posted by Matt Smith <ma...@gmail.com> on 2016/08/31 04:08:00 UTC, 0 replies.
- Spark to Kafka communication encrypted ? - posted by Eric Ho <er...@analyticsmd.com> on 2016/08/31 07:03:42 UTC, 3 replies.
- Re: Slow activation using Spark Streaming's new receiver scheduling mechanism - posted by Renxia Wang <re...@gmail.com> on 2016/08/31 07:15:09 UTC, 0 replies.
- Why does spark take so much time for simple task without calculation? - posted by xiefeng <fx...@statestreet.com> on 2016/08/31 09:45:36 UTC, 1 replies.
- Problem with Graphx and number of partitions - posted by alvarobrandon <al...@gmail.com> on 2016/08/31 11:24:57 UTC, 0 replies.
- Re: SVD output within Spark - posted by Yanbo Liang <yb...@gmail.com> on 2016/08/31 11:33:40 UTC, 0 replies.
- Grouping on bucketed and sorted columns - posted by Fridtjof Sander <fr...@googlemail.com> on 2016/08/31 12:45:49 UTC, 0 replies.
- Does a driver jvm houses some rdd partitions? - posted by Jakub Dubovsky <sp...@gmail.com> on 2016/08/31 13:53:50 UTC, 1 replies.
- Iterative update for LocalLDAModel - posted by jamborta <ja...@gmail.com> on 2016/08/31 15:26:10 UTC, 0 replies.
- releasing memory without stopping the spark context ? - posted by Cesar <ce...@gmail.com> on 2016/08/31 15:56:44 UTC, 2 replies.
- Spark 2.0 - Parquet data with fields containing periods "." - posted by Don Drake <do...@gmail.com> on 2016/08/31 17:48:42 UTC, 1 replies.
- Custom return code - posted by Pierre Villard <pi...@gmail.com> on 2016/08/31 18:40:57 UTC, 0 replies.
- Spark jobs failing by looking for TachyonFS - posted by Venkatesh Rudraraju <ve...@gmail.com> on 2016/08/31 20:26:48 UTC, 0 replies.
- AnalysisException exception while parsing XML - posted by sr...@gmail.com on 2016/08/31 21:19:01 UTC, 1 replies.
- Expected benefit of parquet filter pushdown? - posted by Christon DeWan <cd...@apple.com> on 2016/08/31 21:29:38 UTC, 1 replies.
- Fwd: Pyspark Hbase Problem - posted by md mehrab <md...@gmail.com> on 2016/08/31 22:30:06 UTC, 0 replies.