user@spark.apache.org, 2017-05

You are viewing a plain text version of this content. The canonical link for it is here.

- Schema Evolution for nested Dataset[T] - posted by Mike Wheeler <ro...@gmail.com> on 2017/05/01 04:12:09 UTC, 4 replies.
- Re: removing columns from file - posted by Steve Loughran <st...@hortonworks.com> on 2017/05/01 13:26:05 UTC, 0 replies.
- Reading table from sql database to apache spark dataframe/RDD - posted by Saulo Ricci <in...@gmail.com> on 2017/05/01 14:50:49 UTC, 1 replies.
- Re: Calculate mode separately for multiple columns in row - posted by Everett Anderson <ev...@nuna.com.INVALID> on 2017/05/01 16:47:35 UTC, 0 replies.
- Loading postgresql table to spark SyntaxError - posted by Saulo Ricci <in...@gmail.com> on 2017/05/01 19:49:00 UTC, 0 replies.
- RE: Spark-SQL Query Optimization: overlapping ranges - posted by "Lavelle, Shawn" <Sh...@osii.com> on 2017/05/01 22:00:37 UTC, 0 replies.
- Re: RDD blocks on Spark Driver - posted by Daniel Santana <da...@everymundo.com> on 2017/05/02 00:33:02 UTC, 0 replies.
- Re: Initialize Gaussian Mixture Model using Spark ML dataframe API - posted by Yanbo Liang <yb...@gmail.com> on 2017/05/02 04:01:29 UTC, 1 replies.
- Re: Could any one please tell me why this takes forever to finish? - posted by "颜发才 (Yan Facai)" <fa...@gmail.com> on 2017/05/02 05:34:07 UTC, 0 replies.
- OutOfMemoryError - posted by TwUxTLi51Nus <Tw...@posteo.de> on 2017/05/02 07:07:11 UTC, 0 replies.
- --jars does not take remote jar? - posted by Nan Zhu <zh...@gmail.com> on 2017/05/02 15:43:25 UTC, 4 replies.
- do we need to enable writeAheadLogs for DirectStream as well or is it only for indirect stream? - posted by kant kodali <ka...@gmail.com> on 2017/05/02 16:32:12 UTC, 1 replies.
- Joins in Spark - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2017/05/02 18:22:08 UTC, 5 replies.
- Re: Driver spins hours in query plan optimization - posted by Everett Anderson <ev...@nuna.com.INVALID> on 2017/05/02 18:44:28 UTC, 0 replies.
- [Spark Streaming] Dynamic Broadcast Variable Update - posted by Nipun Arora <ni...@gmail.com> on 2017/05/02 18:56:49 UTC, 3 replies.
- [ANNOUNCE] Apache Spark 2.1.1 - posted by Michael Armbrust <mi...@databricks.com> on 2017/05/02 22:18:58 UTC, 0 replies.
- spark 1.6 .0 and gridsearchcv - posted by issues solution <is...@gmail.com> on 2017/05/03 07:45:21 UTC, 0 replies.
- map/foreachRDD equivalent for pyspark Structured Streaming - posted by peay <pe...@protonmail.com> on 2017/05/03 08:51:16 UTC, 2 replies.
- Benchmark of XGBoost, Vowpal Wabbit and Spark ML on Criteo 1TB Dataset - posted by pklemenkov <pk...@gmail.com> on 2017/05/03 09:42:11 UTC, 0 replies.
- Redefining Spark native UDFs - posted by Miguel Figueiredo <ol...@gmail.com> on 2017/05/03 13:41:12 UTC, 0 replies.
- Re: parquet optimal file structure - flat vs nested - posted by Steve Loughran <st...@hortonworks.com> on 2017/05/03 13:55:35 UTC, 0 replies.
- Refreshing a persisted RDD - posted by JayeshLalwani <Ja...@capitalone.com> on 2017/05/03 14:30:32 UTC, 4 replies.
- [Spark Streaming] - Killing application from within code - posted by Sidney Feiner <si...@startapp.com> on 2017/05/03 15:44:58 UTC, 2 replies.
- Francis Lau has shared a document on Google Docs with you - posted by fr...@smartsheet.com on 2017/05/03 18:32:36 UTC, 0 replies.
- Pat Ferrel has shared a document on Google Docs with you - posted by pa...@occamsmachete.com on 2017/05/03 19:23:50 UTC, 0 replies.
- What are Analysis Errors With respect to Spark Sql DataFrames and DataSets? - posted by kant kodali <ka...@gmail.com> on 2017/05/03 20:38:32 UTC, 4 replies.
- Re: In-order processing using spark streaming - posted by JayeshLalwani <Ja...@capitalone.com> on 2017/05/03 22:18:50 UTC, 0 replies.
- Spark books - posted by Zeming Yu <ze...@gmail.com> on 2017/05/03 22:35:46 UTC, 6 replies.
- Re: Spark-SQL collect function - posted by JayeshLalwani <Ja...@capitalone.com> on 2017/05/03 22:37:15 UTC, 1 replies.
- Re: Synonym handling replacement issue with UDF in Apache Spark - posted by JayeshLalwani <Ja...@capitalone.com> on 2017/05/03 22:51:29 UTC, 1 replies.
- What is the correct JSON parameter format used to to submit Spark2 apps with YARN REST API? - posted by Kun Liu <li...@gmail.com> on 2017/05/04 00:17:24 UTC, 0 replies.
- Spark 2.1.0 and Hive 2.1.1 - posted by Lohith Samaga M <Lo...@mphasis.com> on 2017/05/04 05:04:14 UTC, 0 replies.
- [CFP] DataWorks Summit/Hadoop Summit Sydney - Call for abstracts - posted by Yanbo Liang <yb...@gmail.com> on 2017/05/04 05:15:07 UTC, 0 replies.
- any support to use Spark UDF in HIVE - posted by Manohar753 <ma...@happiestminds.com> on 2017/05/04 07:03:43 UTC, 0 replies.
- Re: Hive on Spark is not populating correct records - posted by Vikash Pareek <vi...@infoobjects.com> on 2017/05/04 07:40:59 UTC, 0 replies.
- unable to find how to integrate SparkSession with a Custom Receiver. - posted by kant kodali <ka...@gmail.com> on 2017/05/04 07:43:19 UTC, 2 replies.
- Create multiple columns in pyspak with one shot - posted by issues solution <is...@gmail.com> on 2017/05/04 07:55:22 UTC, 1 replies.
- Kerberos impersonation of a Spark Context at runtime - posted by matd <ma...@gmail.com> on 2017/05/04 09:13:54 UTC, 2 replies.
- Normalize columns items for Onehotencoder - posted by issues solution <is...@gmail.com> on 2017/05/04 12:29:42 UTC, 0 replies.
- scalastyle violation on mvn install but not on mvn package - posted by yiskylee <yi...@gmail.com> on 2017/05/04 14:45:22 UTC, 7 replies.
- Spark Streaming 2.1 - slave parallel recovery - posted by Dominik Safaric <do...@gmail.com> on 2017/05/04 20:00:56 UTC, 0 replies.
- long running jobs with Spark - posted by "Afshin, Bardia" <Ba...@capitalone.com> on 2017/05/04 21:45:11 UTC, 0 replies.
- imbalance classe inside RANDOMFOREST CLASSIFIER - posted by issues solution <is...@gmail.com> on 2017/05/05 07:58:36 UTC, 1 replies.
- org.apache.hadoop.fs.FileSystem: Provider tachyon.hadoop.TFS could not be instantiated - posted by Jone Zhang <jo...@gmail.com> on 2017/05/05 08:53:18 UTC, 1 replies.
- hbase + spark + hdfs - posted by mathieu ferlay <mf...@gnubila.fr> on 2017/05/05 13:13:11 UTC, 1 replies.
- Reading ORC file - fine on 1.6; GC timeout on 2+ - posted by Nick Chammas <ni...@gmail.com> on 2017/05/05 13:55:13 UTC, 0 replies.
- Structured Streaming + initialState - posted by Patrick McGloin <mc...@gmail.com> on 2017/05/05 14:35:38 UTC, 2 replies.
- how to get assertDataFrameEquals ignore nullable - posted by A Shaikh <sh...@gmail.com> on 2017/05/05 14:39:34 UTC, 1 replies.
- Crossvalidator after fit - posted by issues solution <is...@gmail.com> on 2017/05/05 14:40:12 UTC, 1 replies.
- Where is release 2.1.1? - posted by da...@ontrenet.com on 2017/05/05 14:40:31 UTC, 1 replies.
- is Spark Application code dependent on which mode we run? - posted by kant kodali <ka...@gmail.com> on 2017/05/05 17:39:29 UTC, 0 replies.
- Re: Join streams Apache Spark - posted by saulshanabrook <s....@gmail.com> on 2017/05/06 18:31:33 UTC, 8 replies.
- take the difference between two columns of a dataframe in pyspark - posted by Zeming Yu <ze...@gmail.com> on 2017/05/07 01:49:31 UTC, 2 replies.
- how to check whether spill over to hard drive happened or not - posted by Zeming Yu <ze...@gmail.com> on 2017/05/07 04:10:16 UTC, 0 replies.
- Issue upgrading to Spark 2.1.1 from 2.1.0 - posted by mhornbech <mo...@datasolvr.com> on 2017/05/07 21:14:50 UTC, 1 replies.
- Spark 2.1.0 with Hive 2.1.1? - posted by Lohith Samaga M <Lo...@mphasis.com> on 2017/05/08 07:03:13 UTC, 0 replies.
- how to set up h2o sparkling water on jupyter notebook on a windows machine - posted by Zeming Yu <ze...@gmail.com> on 2017/05/08 11:52:07 UTC, 0 replies.
- Spark Shell issue on HDInsight - posted by ayan guha <gu...@gmail.com> on 2017/05/08 12:01:02 UTC, 6 replies.
- Why does dataset.union fails but dataset.rdd.union execute correctly? - posted by Dirceu Semighini Filho <di...@gmail.com> on 2017/05/08 18:07:42 UTC, 8 replies.
- Updating schemas - posted by Jorge Magallón <jr...@gmail.com> on 2017/05/08 21:47:28 UTC, 0 replies.
- How to read large size files from a directory ? - posted by ashwini anand <aa...@gmail.com> on 2017/05/09 06:47:36 UTC, 2 replies.
- RDD.cacheDataSet() not working intermittently - posted by ja...@accenture.com on 2017/05/09 06:51:26 UTC, 0 replies.
- how to mark a (bean) class with schema for catalyst ? - posted by Yang <te...@gmail.com> on 2017/05/09 07:21:08 UTC, 8 replies.
- SPARK randomforestclassifer and balancing classe - posted by issues solution <is...@gmail.com> on 2017/05/09 15:39:58 UTC, 0 replies.
- Multiple CSV libs causes issues spark 2.1 - posted by "lucas.gary@gmail.com" <lu...@gmail.com> on 2017/05/09 21:02:18 UTC, 6 replies.
- [jira] Lantao Jin shared "SPARK-20680: Spark-sql do not support for void column datatype of view" with you - posted by "Lantao Jin (JIRA)" <ji...@apache.org> on 2017/05/10 03:46:04 UTC, 0 replies.
- features IMportance - posted by issues solution <is...@gmail.com> on 2017/05/10 07:04:04 UTC, 0 replies.
- URGENT : - posted by issues solution <is...@gmail.com> on 2017/05/10 07:26:07 UTC, 0 replies.
- CrossValidator and stackoverflowError - posted by issues solution <is...@gmail.com> on 2017/05/10 09:43:12 UTC, 0 replies.
- running spark program on intellij connecting to remote master for cluster - posted by s t <se...@hotmail.com> on 2017/05/10 09:51:39 UTC, 2 replies.
- Why spark.sql.autoBroadcastJoinThreshold not available - posted by Jone Zhang <jo...@gmail.com> on 2017/05/10 11:10:32 UTC, 2 replies.
- [Spark Streaming] Unknown delay in event timeline - posted by Zhiwen Sun <pe...@gmail.com> on 2017/05/10 11:40:52 UTC, 0 replies.
- [WARN] org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable - posted by "Mendelson, Assaf" <As...@rsa.com> on 2017/05/10 12:40:00 UTC, 2 replies.
- [Spark Core]: Python and Scala generate different DAGs for identical code - posted by Pavel Klemenkov <pk...@gmail.com> on 2017/05/10 14:11:17 UTC, 6 replies.
- incremental broadcast join - posted by "Mendelson, Assaf" <As...@rsa.com> on 2017/05/10 16:28:36 UTC, 0 replies.
- Re: [EXT] Re: [Spark Core]: Python and Scala generate different DAGs for identical code - posted by "Michael Mansour (CS)" <Mi...@symantec.com> on 2017/05/10 18:52:41 UTC, 0 replies.
- CSV output with JOBUUID - posted by Swapnil Shinde <sw...@gmail.com> on 2017/05/10 20:23:15 UTC, 0 replies.
- unsubscribe - posted by wi...@gmail.com on 2017/05/10 22:49:12 UTC, 11 replies.
- Spark <--> S3 flakiness - posted by "lucas.gary@gmail.com" <lu...@gmail.com> on 2017/05/11 05:07:21 UTC, 14 replies.
- Reading Avro messages from Kafka using Structured Streaming in Spark 2.1 - posted by Revin Chalil <rc...@expedia.com> on 2017/05/11 06:21:25 UTC, 2 replies.
- BinaryClassificationMetrics only supports AreaUnderPR and AreaUnderROC? - posted by Lan Jiang <la...@gmail.com> on 2017/05/11 14:31:16 UTC, 1 replies.
- Spark consumes more memory - posted by "Anantharaman, Srinatha (Contractor)" <Sr...@comcast.com> on 2017/05/11 15:46:35 UTC, 2 replies.
- Matrix multiplication and cluster / partition / blocks configuration - posted by John Compitello <jo...@broadinstitute.org> on 2017/05/11 21:12:03 UTC, 0 replies.
- Best Practice for Enum in Spark SQL - posted by Mike Wheeler <ro...@gmail.com> on 2017/05/12 03:07:18 UTC, 1 replies.
- CROSSVALIDATION and hypotetic fail - posted by issues solution <is...@gmail.com> on 2017/05/12 06:54:00 UTC, 1 replies.
- Re: GraphX subgraph from list of VertexIds - posted by Robineast <Ro...@xense.co.uk> on 2017/05/12 07:40:24 UTC, 0 replies.
- spark-submit hangs at the end - posted by "Nagalingam, Karthikeyan" <KA...@netapp.com> on 2017/05/12 14:26:35 UTC, 0 replies.
- Spark Shuffle Encryption - posted by Shashi Vishwakarma <sh...@gmail.com> on 2017/05/12 16:11:41 UTC, 1 replies.
- Convert DStream into Streaming Dataframe - posted by Tejinder Aulakh <ta...@salesforce.com> on 2017/05/12 16:49:50 UTC, 2 replies.
- Question on whether to use Java 8 or Scala for writing Spark applications - posted by raghavendran_c <ra...@hotmail.com> on 2017/05/12 17:00:46 UTC, 0 replies.
- Restful API Spark Application - posted by Nipun Arora <ni...@gmail.com> on 2017/05/12 20:00:18 UTC, 5 replies.
- spark write hex null string terminates into columns - posted by "Afshin, Bardia" <Ba...@capitalone.com> on 2017/05/13 00:10:51 UTC, 0 replies.
- what is the difference between json format vs kafka format? - posted by kant kodali <ka...@gmail.com> on 2017/05/13 10:42:10 UTC, 5 replies.
- what does this error mean? - posted by Zeming Yu <ze...@gmail.com> on 2017/05/13 12:21:50 UTC, 1 replies.
- Is GraphX really deprecated? - posted by Sergey Zhemzhitsky <sz...@gmail.com> on 2017/05/13 13:00:19 UTC, 2 replies.
- Cassandra Simple Insert Statement using Spark SQL Fails with org.apache.spark.sql.catalyst.parser.ParseException - posted by Abdulfattah Safa <fa...@gmail.com> on 2017/05/14 11:57:30 UTC, 2 replies.
- RE: Spark SQL DataFrame to Kafka Topic - posted by Revin Chalil <rc...@expedia.com> on 2017/05/14 16:31:40 UTC, 3 replies.
- [PYTHON] PySpark typing hints - posted by Maciej Szymkiewicz <ms...@gmail.com> on 2017/05/14 21:44:17 UTC, 0 replies.
- spark on yarn cluster model can't use saveAsTable ? - posted by lk_spark <lk...@163.com> on 2017/05/15 07:52:33 UTC, 0 replies.
- save SPark ml - posted by issues solution <is...@gmail.com> on 2017/05/15 08:32:47 UTC, 1 replies.
- Any solution for this? - posted by Aakash Basu <aa...@gmail.com> on 2017/05/15 08:41:43 UTC, 0 replies.
- Kafka 0.8.x / 0.9.x support in structured streaming - posted by Swapnil Chougule <th...@gmail.com> on 2017/05/15 09:25:43 UTC, 1 replies.
- Test - posted by nayan sharma <na...@gmail.com> on 2017/05/15 09:52:52 UTC, 0 replies.
- spark sql insert hive non-partitioned table failure(java.io.NotSerializableException: org.apache.hadoop.mapreduce.Job) - posted by 李斌松 <li...@gmail.com> on 2017/05/15 10:36:50 UTC, 1 replies.
- ElasticSearch Spark error - posted by nayan sharma <na...@gmail.com> on 2017/05/15 11:18:44 UTC, 2 replies.
- How can i merge multiple rows to one row in sparksql or hivesql? - posted by Jone Zhang <jo...@gmail.com> on 2017/05/15 13:15:30 UTC, 4 replies.
- Adding worker dynamically in standalone mode - posted by se...@nomura.com on 2017/05/15 14:27:04 UTC, 2 replies.
- Spark streaming - TIBCO EMS - posted by Pradeep <pr...@mail.com> on 2017/05/15 15:47:07 UTC, 1 replies.
- Application dies, Driver keeps on running - posted by map reduced <k3...@gmail.com> on 2017/05/15 22:01:40 UTC, 3 replies.
- How to print data to console in structured streaming using Spark 2.1.0? - posted by kant kodali <ka...@gmail.com> on 2017/05/16 07:36:06 UTC, 8 replies.
- Spark streaming app leaking memory? - posted by Srikanth <sr...@gmail.com> on 2017/05/16 16:12:33 UTC, 0 replies.
- How to replay stream between 2 offsets? - posted by ranjitreddy <ra...@yahoo.com> on 2017/05/16 17:39:18 UTC, 0 replies.
- Re: How does preprocessing fit into Spark MLlib pipeline - posted by Adrian Stern <ad...@vidora.com> on 2017/05/16 18:34:23 UTC, 0 replies.
- Spark Streaming 2.1 recovery - posted by Dominik Safaric <do...@gmail.com> on 2017/05/16 19:54:54 UTC, 0 replies.
- Documentation on "Automatic file coalescing for native data sources"? - posted by Daniel Siegmann <ds...@securityscorecard.io> on 2017/05/16 20:12:54 UTC, 4 replies.
- Cannot create parquet with snappy output for hive external table - posted by Dhimant <dh...@gmail.com> on 2017/05/16 22:40:02 UTC, 0 replies.
- Re: s3 bucket access/read file - posted by jazzed <cr...@gmail.com> on 2017/05/16 23:10:02 UTC, 1 replies.
- KTable like functionality in structured streaming - posted by Stephen Fletcher <st...@gmail.com> on 2017/05/17 00:41:19 UTC, 1 replies.
- Spark Streaming: NullPointerException when restoring Spark Streaming job from hdfs/s3 checkpoint - posted by Richard Moorhead <ri...@c2fo.com> on 2017/05/17 02:13:58 UTC, 0 replies.
- Re: Not able pass 3rd party jars to mesos executors - posted by Satya Narayan1 <sa...@gmail.com> on 2017/05/17 05:00:42 UTC, 1 replies.
- spark-submit in mesos cluster mode --jars option not working - posted by Satya Narayan1 <sa...@gmail.com> on 2017/05/17 05:17:22 UTC, 1 replies.
- Re: checkpointing without streaming? - posted by "neelesh.sa" <sa...@gmail.com> on 2017/05/17 07:01:38 UTC, 3 replies.
- Parquet file amazon s3a timeout - posted by Karin Valisova <ka...@datapine.com> on 2017/05/17 10:13:50 UTC, 1 replies.
- Cloudera 5.8.0 and spark 2.1.1 - posted by issues solution <is...@gmail.com> on 2017/05/17 11:30:12 UTC, 1 replies.
- spark cluster performance decreases by adding more nodes - posted by Junaid Nasir <jn...@an10.io> on 2017/05/17 15:13:35 UTC, 5 replies.
- Spark Launch programatically - Basics! - posted by Nipun Arora <ni...@gmail.com> on 2017/05/17 20:47:03 UTC, 1 replies.
- How to flatten struct into a dataframe? - posted by kant kodali <ka...@gmail.com> on 2017/05/17 23:06:11 UTC, 2 replies.
- Jupyter spark Scala notebooks - posted by upendra 1991 <up...@yahoo.com.INVALID> on 2017/05/18 02:22:14 UTC, 5 replies.
- How to see the full contents of dataset or dataframe is structured streaming? - posted by kant kodali <ka...@gmail.com> on 2017/05/18 02:55:06 UTC, 4 replies.
- spark ML Recommender program - posted by Arun <ar...@gmail.com> on 2017/05/18 03:15:31 UTC, 4 replies.
- Optimizing dataset joins - posted by Daniel Haviv <da...@gmail.com> on 2017/05/18 08:46:06 UTC, 0 replies.
- Spark Structured Streaming is taking too long to process 2KB messages - posted by kant kodali <ka...@gmail.com> on 2017/05/18 09:39:58 UTC, 1 replies.
- SparkAppHandle - get Input and output streams - posted by Nipun Arora <ni...@gmail.com> on 2017/05/18 17:10:20 UTC, 1 replies.
- Forcing either Hive or Spark SQL representation for metastore - posted by Justin Miller <ju...@protectwise.com> on 2017/05/18 22:01:16 UTC, 0 replies.
- IOT in Spark - posted by Gaurav1809 <ga...@gmail.com> on 2017/05/19 03:58:29 UTC, 3 replies.
- How does spark hiveserver dynamically update function dependent jar? - posted by 李斌松 <li...@gmail.com> on 2017/05/19 05:16:40 UTC, 0 replies.
- Spark 2 Kafka Direct Stream Consumer Issue - posted by Jayadeep J <ja...@gmail.com> on 2017/05/19 08:00:49 UTC, 1 replies.
- Is there a Kafka sink for Spark Structured Streaming - posted by ka...@gmail.com on 2017/05/19 08:45:40 UTC, 0 replies.
- Re: Is there a Kafka sink for Spark Structured Streaming - posted by Patrick McGloin <mc...@gmail.com> on 2017/05/19 12:55:59 UTC, 5 replies.
- java.lang.OutOfMemoryError - posted by Kürşat Kurt <ku...@kursatkurt.com> on 2017/05/19 13:58:23 UTC, 0 replies.
- Reading PDF/text/word file efficiently with Spark - posted by tesmai4 <te...@gmail.com> on 2017/05/19 17:43:54 UTC, 3 replies.
- Shuffle read is very slow in spark - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2017/05/19 19:05:01 UTC, 0 replies.
- Bizarre UI Behavior after migration - posted by Miles Crawford <mi...@allenai.org> on 2017/05/19 21:45:41 UTC, 2 replies.
- Re: Spark UI shows Jobs are processing, but the files are already written to S3 - posted by Miles Crawford <mi...@allenai.org> on 2017/05/19 21:47:21 UTC, 0 replies.
- SparkSQL not able to read a empty table location - posted by "Bajpai, Amit X. -ND" <Am...@disney.com> on 2017/05/20 00:44:58 UTC, 3 replies.
- GraphFrames 0.5.0 - critical bug fix + other improvements - posted by Joseph Bradley <jo...@databricks.com> on 2017/05/20 00:52:19 UTC, 0 replies.
- Re: [Spark Streamiing] Streaming job failing consistently after 1h - posted by Manish Malhotra <ma...@gmail.com> on 2017/05/20 05:52:53 UTC, 0 replies.
- Spark Streaming: Custom Receiver OOM consistently - posted by Manish Malhotra <ma...@gmail.com> on 2017/05/20 05:54:21 UTC, 4 replies.
- couple naive questions on Spark Structured Streaming - posted by kant kodali <ka...@gmail.com> on 2017/05/21 00:39:27 UTC, 2 replies.
- Rmse recomender system - posted by Arun <ar...@gmail.com> on 2017/05/21 02:48:10 UTC, 1 replies.
- Sampling data on RDD vs sampling data on Dataframes - posted by Marco Didonna <m....@gmail.com> on 2017/05/21 15:50:37 UTC, 0 replies.
- Are tachyon and akka removed from 2.1.1 please - posted by 萝卜丝炒饭 <14...@qq.com> on 2017/05/22 01:15:08 UTC, 0 replies.
- Spark on Mesos failure, when launching a simple job - posted by ved_kpl <ve...@gmail.com> on 2017/05/22 07:39:47 UTC, 0 replies.
- KMeans Clustering is not Reproducible - posted by Christoph Bruecke <ca...@gmail.com> on 2017/05/22 13:42:38 UTC, 5 replies.
- Re: Are tachyon and akka removed from 2.1.1 please - posted by Gene Pang <ge...@gmail.com> on 2017/05/22 14:19:44 UTC, 4 replies.
- Broadcasted Object is empty in executors. - posted by Pedro Tuero <tu...@gmail.com> on 2017/05/22 19:42:25 UTC, 0 replies.
- Convert camelCase to snake_case when saving Dataframe/Dataset to parquet? - posted by Mike Wheeler <ro...@gmail.com> on 2017/05/22 20:53:02 UTC, 2 replies.
- streaming of binary files in PySpark - posted by Yogesh Vyas <in...@gmail.com> on 2017/05/23 05:02:26 UTC, 0 replies.
- [Stateful Spark Streaming] Issues on Restart of job from checkpoint in v1.6.2 - Initial State RDD for MapWithState not attached to Spark Context - posted by "Jalal, Jamal" <ja...@sap.com> on 2017/05/23 05:04:47 UTC, 0 replies.
- Custom function cannot be accessed across database - posted by 李斌松 <li...@gmail.com> on 2017/05/23 06:13:20 UTC, 0 replies.
- Re: OptionalDataException during Naive Bayes Training - posted by elitejyo <el...@yahoo.co.in> on 2017/05/23 10:49:51 UTC, 0 replies.
- How to generate stage for this RDD DAG please? - posted by 萝卜丝炒饭 <14...@qq.com> on 2017/05/23 11:45:22 UTC, 0 replies.
- Dependencies for starting Master / Worker in maven - posted by Jens Teglhus Møller <dj...@gmail.com> on 2017/05/23 12:21:57 UTC, 1 replies.
- user-unsubscribe@spark.apache.org - posted by wi...@gmail.com on 2017/05/23 12:55:09 UTC, 6 replies.
- 2.2. release date ? - posted by mojhaha kiklasds <se...@gmail.com> on 2017/05/23 16:41:21 UTC, 3 replies.
- Are there any Kafka forEachSink examples? - posted by kant kodali <ka...@gmail.com> on 2017/05/23 18:35:31 UTC, 2 replies.
- Impact of coalesce operation before writing dataframe - posted by Andrii Biletskyi <an...@yahoo.com.INVALID> on 2017/05/23 19:14:05 UTC, 5 replies.
- Spark Application hangs without trigger SparkShutdownHook - posted by Xiaoye Sun <su...@gmail.com> on 2017/05/23 20:04:11 UTC, 0 replies.
- One question / kerberos, yarn-cluster -> connection to hbase - posted by Sudhir Jangir <su...@infoobjects.com> on 2017/05/24 17:07:18 UTC, 2 replies.
- ML :- Spark Cluster + Parameter servers - posted by Madabhattula Rajesh Kumar <mr...@gmail.com> on 2017/05/24 18:34:24 UTC, 0 replies.
- [PySpark] - Broadcast Variable Pickle Registry Usage? - posted by "Michael Mansour (CS)" <Mi...@symantec.com> on 2017/05/24 19:13:21 UTC, 0 replies.
- Running into the same problem as JIRA SPARK-19268 - posted by kant kodali <ka...@gmail.com> on 2017/05/24 22:35:22 UTC, 15 replies.
- Questions regarding Jobs, Stages and Caching - posted by ramnavan <hi...@gmail.com> on 2017/05/25 05:28:43 UTC, 5 replies.
- strange warning - posted by "Mendelson, Assaf" <As...@rsa.com> on 2017/05/25 06:54:58 UTC, 1 replies.
- Re: Sharing my DataFrame (DataSet) cheat sheet. - posted by "颜发才 (Yan Facai)" <fa...@gmail.com> on 2017/05/25 06:58:09 UTC, 0 replies.
- Structured Streaming from Parquet - posted by Paul Corley <pa...@ignitionone.com> on 2017/05/25 14:13:05 UTC, 2 replies.
- access error while trying to run distcp from source cluster - posted by nancy henry <na...@gmail.com> on 2017/05/25 14:34:55 UTC, 0 replies.
- shuffle write is very slow - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2017/05/25 21:57:04 UTC, 0 replies.
- get partition locations in spark - posted by girish hilage <gi...@yahoo.com.INVALID> on 2017/05/26 03:37:55 UTC, 0 replies.
- Disable queuing of spark job on Mesos cluster if sufficient resources are not found - posted by "Mevada, Vatsal" <Me...@sky.optymyze.com> on 2017/05/26 11:38:00 UTC, 3 replies.
- convert ps to jpg file - posted by Selvam Raman <se...@gmail.com> on 2017/05/26 13:47:09 UTC, 0 replies.
- Spark checkpoint - nonstreaming - posted by Priya <pm...@gmail.com> on 2017/05/26 14:06:27 UTC, 4 replies.
- using pandas and pyspark to run ETL job - always failing after about 40 minutes - posted by Zeming Yu <ze...@gmail.com> on 2017/05/26 14:32:06 UTC, 0 replies.
- - posted by Anton Kravchenko <kr...@gmail.com> on 2017/05/26 15:00:17 UTC, 0 replies.
- Temp checkpoint directory for EMR (S3 or HDFS) - posted by Everett Anderson <ev...@nuna.com.INVALID> on 2017/05/26 16:08:57 UTC, 2 replies.
- [Spark Streaming] DAG Execution Model Clarification - posted by Nipun Arora <ni...@gmail.com> on 2017/05/26 17:11:02 UTC, 0 replies.
- examples for flattening dataframes using pyspark - posted by Zeming Yu <ze...@gmail.com> on 2017/05/27 10:18:03 UTC, 1 replies.
- [WARN] org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl - posted by "Mendelson, Assaf" <As...@rsa.com> on 2017/05/28 09:21:55 UTC, 0 replies.
- Spark sql 2.1.0 thrift jdbc server - create table xxx as select * from yyy sometimes get error - posted by 道玉 <z....@qq.com> on 2017/05/28 13:16:49 UTC, 0 replies.
- 回复：Spark sql 2.1.0 thrift jdbc server - create table xxx as select * from yyy sometimes get error - posted by 道玉 <z....@qq.com> on 2017/05/28 13:52:21 UTC, 2 replies.
- Using SparkContext in Executors - posted by Abdulfattah Safa <fa...@gmail.com> on 2017/05/28 13:56:42 UTC, 5 replies.
- [Spark Streaming] DAG Output Processing mechanism - posted by Nipun Arora <ni...@gmail.com> on 2017/05/28 13:59:51 UTC, 2 replies.
- Dynamically working out upperbound in JDBC connection to Oracle DB - posted by Mich Talebzadeh <mi...@gmail.com> on 2017/05/29 16:11:45 UTC, 4 replies.
- Schema Evolution Parquet vs Avro - posted by Joel D <ga...@gmail.com> on 2017/05/30 02:04:13 UTC, 1 replies.
- help require in converting CassandraRDD to VertexRDD & EdgeRDD - posted by Tania Khan <me...@gmail.com> on 2017/05/30 07:36:22 UTC, 0 replies.
- Message getting lost in Kafka + Spark Streaming - posted by Vikash Pareek <vi...@infoobjects.com> on 2017/05/30 12:59:56 UTC, 1 replies.
- No TypeTag Available for String - posted by krishmah <kr...@gmail.com> on 2017/05/30 17:01:24 UTC, 0 replies.
- Re: Random Forest hangs without trace of error - posted by Sumona Routh <su...@gmail.com> on 2017/05/30 17:29:30 UTC, 1 replies.
- foreachPartition in Spark Java API - posted by Anton Kravchenko <kr...@gmail.com> on 2017/05/30 17:58:54 UTC, 2 replies.
- Checkpointing fro reduceByKeyAndWindow with a window size of 1 hour and 24 hours - posted by SRK <sw...@gmail.com> on 2017/05/30 21:36:55 UTC, 0 replies.
- How to convert Dataset to Dataset in Spark Structured Streaming? - posted by kant kodali <ka...@gmail.com> on 2017/05/31 02:31:19 UTC, 4 replies.
- Worker node log not showed - posted by Paolo Patierno <pp...@live.com> on 2017/05/31 07:42:41 UTC, 2 replies.
- Help in Parsing 'Categorical' type of data - posted by Amlan Jyoti <am...@tcs.com> on 2017/05/31 07:58:33 UTC, 0 replies.
- Creating Dataframe by querying Impala - posted by morfious902002 <an...@gmail.com> on 2017/05/31 14:51:50 UTC, 0 replies.
- An Architecture question on the use of virtualised clusters - posted by Mich Talebzadeh <mi...@gmail.com> on 2017/05/31 16:07:06 UTC, 0 replies.
- Running into the same problem as JIRA SPARK-20325 - posted by kant kodali <ka...@gmail.com> on 2017/05/31 18:44:35 UTC, 1 replies.
- Problem with master webui with reverse proxy when workers >= 10 - posted by tmckay <tm...@redhat.com> on 2017/05/31 19:46:31 UTC, 0 replies.
- mapWithState termination - posted by Dominik Safaric <do...@gmail.com> on 2017/05/31 20:07:02 UTC, 0 replies.
- good http sync client to be used with spark - posted by vimal dinakaran <vi...@gmail.com> on 2017/05/31 20:08:17 UTC, 0 replies.
- [apache-spark] Re: Problem with master webui with reverse proxy when workers >= 10 - posted by Trevor McKay <tm...@redhat.com> on 2017/05/31 21:25:58 UTC, 0 replies.
- The following Error seems to happen once in every ten minutes (Spark Structured Streaming)? - posted by kant kodali <ka...@gmail.com> on 2017/05/31 23:05:18 UTC, 0 replies.
- Question about mllib.recommendation.ALS - posted by Sahib, , Search, , <sa...@coupang.com> on 2017/05/31 23:48:09 UTC, 0 replies.