user@spark.apache.org, 2016-07

You are viewing a plain text version of this content. The canonical link for it is here.

- RE: Spark jobs - posted by Joaquin Alzola <Jo...@lebara.com> on 2016/07/01 00:36:38 UTC, 0 replies.
- RE: Spark 2.0 Continuous Processing - posted by kmat <ku...@hotmail.com> on 2016/07/01 01:01:13 UTC, 0 replies.
- Looking for help about stackoverflow in spark - posted by johnzeng <jo...@fossil.com> on 2016/07/01 02:03:58 UTC, 1 replies.
- Re: One map per folder in spark or Hadoop - posted by Sun Rui <su...@163.com> on 2016/07/01 02:16:21 UTC, 3 replies.
- Why so many parquet file part when I store data in Alluxio or File? - posted by Chanh Le <gi...@gmail.com> on 2016/07/01 02:29:28 UTC, 9 replies.
- HiveContext - posted by manish jaiswal <ma...@gmail.com> on 2016/07/01 03:38:22 UTC, 2 replies.
- How spark makes partition when we insert data using the Sql query, and how the permissions to the partitions is assigned.? - posted by shiv4nsh <sh...@knoldus.com> on 2016/07/01 07:43:59 UTC, 1 replies.
- JavaStreamingContext.stop() hangs - posted by manoop <su...@umalkar.com> on 2016/07/01 08:12:58 UTC, 1 replies.
- Re: Remote RPC client disassociated - posted by Akhil Das <ak...@hacked.work> on 2016/07/01 10:38:09 UTC, 2 replies.
- Re: RDD to DataFrame question with JsValue in the mix - posted by Akhil Das <ak...@hacked.work> on 2016/07/01 10:42:04 UTC, 1 replies.
- Re: How to spin up Kafka using docker and use for Spark Streaming Integration tests - posted by Akhil Das <ak...@hacked.work> on 2016/07/01 10:46:20 UTC, 4 replies.
- Deploying ML Pipeline Model - posted by Rishabh Bhardwaj <rb...@gmail.com> on 2016/07/01 11:54:06 UTC, 10 replies.
- Spark 2.0.0-preview ... problem with jackson core version - posted by Paolo Patierno <pp...@live.com> on 2016/07/01 14:24:03 UTC, 9 replies.
- How are threads created in SQL Executor? - posted by emiretsk <eu...@gmail.com> on 2016/07/01 15:42:32 UTC, 1 replies.
- Re: Random Forest Classification - posted by Rich Tarro <ri...@gmail.com> on 2016/07/01 16:24:53 UTC, 1 replies.
- Thrift JDBC server - why only one per machine and only yarn-client - posted by Egor Pahomov <pa...@gmail.com> on 2016/07/01 16:32:47 UTC, 9 replies.
- Cluster mode deployment from jar in S3 - posted by Ashic Mahtab <as...@live.com> on 2016/07/01 16:45:12 UTC, 5 replies.
- Re: Aggregator (Spark 2.0) skips aggregation is zero(0 returns null - posted by Amit Sela <am...@gmail.com> on 2016/07/01 21:04:48 UTC, 1 replies.
- Spark driver assigning splits to incorrect workers - posted by Raajen <ra...@gmail.com> on 2016/07/01 21:46:22 UTC, 2 replies.
- Re: output part files max size - posted by "Kali.tummala@gmail.com" <Ka...@gmail.com> on 2016/07/01 23:26:08 UTC, 0 replies.
- Re: Best way to merge final output part files created by Spark job - posted by "Kali.tummala@gmail.com" <Ka...@gmail.com> on 2016/07/01 23:36:00 UTC, 0 replies.
- Enforcing shuffle hash join - posted by Lalitha MV <la...@gmail.com> on 2016/07/01 23:56:21 UTC, 7 replies.
- spark parquet too many small files ? - posted by "Kali.tummala@gmail.com" <Ka...@gmail.com> on 2016/07/02 00:17:22 UTC, 5 replies.
- Re: Spark ML - Java implementation of custom Transformer - posted by Yanbo Liang <yb...@gmail.com> on 2016/07/02 07:23:38 UTC, 0 replies.
- Re: Custom Optimizer - posted by Yanbo Liang <yb...@gmail.com> on 2016/07/02 07:28:06 UTC, 0 replies.
- Re: Ideas to put a Spark ML model in production - posted by Yanbo Liang <yb...@gmail.com> on 2016/07/02 07:45:03 UTC, 1 replies.
- Re: Get both feature importance and ROC curve from a random forest classifier - posted by Yanbo Liang <yb...@gmail.com> on 2016/07/02 08:04:08 UTC, 1 replies.
- Re: Trainning a spark ml linear regresion model fail after migrating from 1.5.2 to 1.6.1 - posted by Yanbo Liang <yb...@gmail.com> on 2016/07/02 08:19:13 UTC, 0 replies.
- Re: Several questions about how pyspark.ml works - posted by Yanbo Liang <yb...@gmail.com> on 2016/07/02 08:30:16 UTC, 0 replies.
- Spark-13979: issues with hadoopConf - posted by Gil Vernik <GI...@il.ibm.com> on 2016/07/02 12:06:54 UTC, 0 replies.
- Working of Streaming Kmeans - posted by Biplob Biswas <re...@gmail.com> on 2016/07/02 14:48:41 UTC, 2 replies.
- latest version of Spark to work OK as Hive engine - posted by Ashok Kumar <as...@yahoo.com.INVALID> on 2016/07/02 18:30:52 UTC, 0 replies.
- Bootstrap Action to Install Spark 2.0 on EMR? - posted by Renxia Wang <re...@gmail.com> on 2016/07/03 05:46:01 UTC, 1 replies.
- AMQP extension for Apache Spark Streaming (messaging/IoT) - posted by Paolo Patierno <pp...@live.com> on 2016/07/03 08:41:36 UTC, 1 replies.
- Re: 'numBins' property not honoured in BinaryClassificationMetrics class when spark.default.parallelism is not set to 1 - posted by sneha29shukla <sn...@gmail.com> on 2016/07/03 10:04:17 UTC, 1 replies.
- JAr files into python3 - posted by Joaquin Alzola <Jo...@lebara.com> on 2016/07/03 20:01:27 UTC, 0 replies.
- Saving parquet table as uncompressed with write.mode("overwrite"). - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/07/03 21:42:22 UTC, 3 replies.
- Graphframe Error - posted by Arun Patel <ar...@gmail.com> on 2016/07/03 22:48:10 UTC, 6 replies.
- Custom RDD: Report Size of Partition in Bytes to Spark - posted by Pedro Rodriguez <sk...@gmail.com> on 2016/07/04 02:46:10 UTC, 2 replies.
- How to struct data in parquet format? - posted by Chanh Le <gi...@gmail.com> on 2016/07/04 03:28:13 UTC, 0 replies.
- ORC or parquet with Spark - posted by Ashok Kumar <as...@yahoo.com.INVALID> on 2016/07/04 06:53:07 UTC, 0 replies.
- Limiting Pyspark.daemons - posted by ar7 <as...@gmail.com> on 2016/07/04 07:15:47 UTC, 11 replies.
- Re: java.io.FileNotFoundException - posted by kishore kumar <ak...@gmail.com> on 2016/07/04 08:57:10 UTC, 7 replies.
- Specifying Fixed Duration (Spot Block) for AWS Spark EC2 Cluster - posted by nsharkey <ni...@gmail.com> on 2016/07/04 13:59:52 UTC, 0 replies.
- How to handle update/deletion in Structured Streaming? - posted by Arnaud Bailly <ar...@gmail.com> on 2016/07/04 15:23:29 UTC, 1 replies.
- Spark application doesn't scale to worker nodes - posted by Jakub Stransky <st...@gmail.com> on 2016/07/04 15:31:51 UTC, 17 replies.
- pyspark aggregate vectors from onehotencoder - posted by Sebastian Kuepers <se...@publicispixelpark.de> on 2016/07/04 19:40:03 UTC, 0 replies.
- log traces - posted by Jean Georges Perrin <jg...@jgp.net> on 2016/07/04 20:56:59 UTC, 5 replies.
- Spark MLlib: MultilayerPerceptronClassifier error? - posted by mshiryae <mi...@intel.com> on 2016/07/04 21:09:56 UTC, 3 replies.
- Pregel algorithm edge direction docs - posted by svjk24 <sv...@gmail.com> on 2016/07/05 02:32:40 UTC, 0 replies.
- Spark Dataframe validating column names - posted by Scott W <de...@gmail.com> on 2016/07/05 05:02:59 UTC, 2 replies.
- Is that possible to launch spark streaming application on yarn with only one machine? - posted by Yu Wei <yu...@hotmail.com> on 2016/07/05 06:31:04 UTC, 12 replies.
- How to Create a Database in Spark SQL - posted by lokeshyadav <lo...@gmail.com> on 2016/07/05 06:57:28 UTC, 1 replies.
- How Spark HA works - posted by Akmal Abbasov <ak...@icloud.com> on 2016/07/05 08:34:24 UTC, 0 replies.
- 回复： Enforcing shuffle hash join - posted by 喜之郎 <25...@qq.com> on 2016/07/05 08:59:46 UTC, 0 replies.
- pyspark: dataframe.take is slow - posted by immerrr again <im...@gmail.com> on 2016/07/05 09:27:58 UTC, 0 replies.
- Dataframe sort - posted by tan shai <ta...@gmail.com> on 2016/07/05 10:57:53 UTC, 0 replies.
- StreamingKmeans Spark doesn't work at all - posted by Biplob Biswas <re...@gmail.com> on 2016/07/05 11:00:25 UTC, 0 replies.
- Spark MLlib: network intensive algorithms - posted by mshiryae <mi...@intel.com> on 2016/07/05 11:07:57 UTC, 0 replies.
- Read Kafka topic in a Spark batch job - posted by Bruckwald Tamás <ta...@freemail.hu> on 2016/07/05 12:15:23 UTC, 3 replies.
- Having issues of passing properties to Spark in 1.5 in comparison to 1.2 - posted by Nkechi Achara <nk...@googlemail.com> on 2016/07/05 13:40:49 UTC, 0 replies.
- Standalone mode resource allocation questions - posted by Jakub Stransky <st...@gmail.com> on 2016/07/05 14:18:15 UTC, 1 replies.
- Spark streaming. Strict discretizing by time - posted by rss rss <rs...@gmail.com> on 2016/07/05 15:02:53 UTC, 14 replies.
- remove row from data frame - posted by pseudo oduesp <ps...@gmail.com> on 2016/07/05 15:38:33 UTC, 1 replies.
- SnappyData and Structured Streaming - posted by Benjamin Kim <bb...@gmail.com> on 2016/07/05 19:19:09 UTC, 4 replies.
- spark local dir to HDFS ? - posted by "Kali.tummala@gmail.com" <Ka...@gmail.com> on 2016/07/05 22:47:37 UTC, 4 replies.
- where is open source Distributed service framework use for spark?? - posted by 另一片天 <95...@qq.com> on 2016/07/06 06:24:29 UTC, 0 replies.
- Re: Spark Task failure with File segment length as negative - posted by Priya Ch <le...@gmail.com> on 2016/07/06 08:01:22 UTC, 0 replies.
- streaming new data into bigger parquet file - posted by Igor Berman <ig...@gmail.com> on 2016/07/06 09:27:21 UTC, 0 replies.
- how to select first 50 value of each group after group by? - posted by lu...@sina.com on 2016/07/06 11:07:17 UTC, 4 replies.
- It seemed JavaDStream.print() did not work when launching via yarn on a single node - posted by Yu Wei <yu...@hotmail.com> on 2016/07/06 11:35:52 UTC, 7 replies.
- Spark Left outer Join issue using programmatic sql joins - posted by Radha krishna <gr...@gmail.com> on 2016/07/06 12:29:36 UTC, 10 replies.
- spark 2.0 bloom filters - posted by matd <ma...@gmail.com> on 2016/07/06 13:23:50 UTC, 0 replies.
- SparkR | Exception in invokeJava: SparkR + Windows standalone cluster - posted by AC24 <an...@gmail.com> on 2016/07/06 14:08:58 UTC, 0 replies.
- Difference between DataFrame.write.jdbc and DataFrame.write.format("jdbc") - posted by Dragisa Krsmanovic <dr...@ticketfly.com> on 2016/07/06 17:05:23 UTC, 4 replies.
- Maintain complete state for updateStateByKey - posted by Sunita Arvind <su...@gmail.com> on 2016/07/06 17:40:46 UTC, 0 replies.
- spark classloader question - posted by Chen Song <ch...@gmail.com> on 2016/07/06 18:28:07 UTC, 5 replies.
- Is Spark suited for replacing a batch job using many database tables? - posted by dabuki <da...@gmail.com> on 2016/07/06 19:25:52 UTC, 12 replies.
- Presentation in London: Running Spark on Hive or Hive on Spark - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/07/06 21:37:52 UTC, 6 replies.
- Logs of spark driver in yarn-client mode. - posted by Egor Pahomov <pa...@gmail.com> on 2016/07/07 00:25:26 UTC, 0 replies.
- Structured Streaming Comparison to AMPS - posted by craigjar <cr...@gmail.com> on 2016/07/07 03:19:24 UTC, 2 replies.
- Question regarding structured data and partitions - posted by Omid Alipourfard <ec...@gmail.com> on 2016/07/07 03:55:30 UTC, 4 replies.
- Processing json document - posted by Lan Jiang <lj...@gmail.com> on 2016/07/07 05:48:30 UTC, 8 replies.
- Re: MLLib SVMWithSGD is failing for large dataset - posted by Chitturi Padma <le...@gmail.com> on 2016/07/07 06:17:26 UTC, 0 replies.
- SparkSQL Added file get Exception: is a directory and recursive is not turned on - posted by linxi zeng <li...@gmail.com> on 2016/07/07 06:18:51 UTC, 0 replies.
- SPARK-8813 - combining small files in spark sql - posted by Ajay Srivastava <a_...@yahoo.com.INVALID> on 2016/07/07 06:53:08 UTC, 0 replies.
- Spark with HBase Error - Py4JJavaError - posted by Puneet Tripathi <Pu...@dunnhumby.com> on 2016/07/07 07:11:33 UTC, 3 replies.
- stddev_samp() gives NaN - posted by Mungeol Heo <mu...@gmail.com> on 2016/07/07 08:23:37 UTC, 8 replies.
- Optimize filter operations with sorted data - posted by tan shai <ta...@gmail.com> on 2016/07/07 09:25:07 UTC, 5 replies.
- Spark streaming Kafka Direct API + Multiple consumers - posted by SamyaMaiti <sa...@gmail.com> on 2016/07/07 09:34:34 UTC, 1 replies.
- Multiple aggregations over streaming dataframes - posted by Arnaud Bailly <ar...@gmail.com> on 2016/07/07 10:18:09 UTC, 7 replies.
- ClassNotFoundException: org.apache.parquet.hadoop.ParquetOutputCommitter - posted by kevin <ki...@gmail.com> on 2016/07/07 10:47:11 UTC, 1 replies.
- RDD and Dataframes - posted by brccosta <br...@gmail.com> on 2016/07/07 11:20:14 UTC, 4 replies.
- 回复：Re: how to select first 50 value of each group after group by? - posted by lu...@sina.com on 2016/07/07 11:26:07 UTC, 0 replies.
- problem extracting map from json - posted by Michal Vince <vi...@gmail.com> on 2016/07/07 12:18:03 UTC, 1 replies.
- categoricalFeaturesInfo - posted by pseudo oduesp <ps...@gmail.com> on 2016/07/07 13:12:21 UTC, 0 replies.
- Extend Dataframe API - posted by tan shai <ta...@gmail.com> on 2016/07/07 13:31:49 UTC, 3 replies.
- Spark 1.6.2 short circuit AND filter broken - posted by Patrick Woody <pa...@gmail.com> on 2016/07/07 15:10:19 UTC, 0 replies.
- Re: StreamingKmeans Spark doesn't work at all - posted by Biplob Biswas <re...@gmail.com> on 2016/07/07 15:21:51 UTC, 3 replies.
- spark streaming: how come I have scheduling delay when processing time is less then batch windowing size - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/07/07 17:47:21 UTC, 1 replies.
- Spark as sql engine on S3 - posted by Ashok Kumar <as...@yahoo.com.INVALID> on 2016/07/07 17:50:35 UTC, 5 replies.
- spark read from http endpoint? - posted by Robert Towne <Ro...@WebTrends.com> on 2016/07/07 20:39:53 UTC, 0 replies.
- is dataframe.write() async? Streaming performance problem - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/07/07 20:59:36 UTC, 2 replies.
- Re: Compute pairwise distance - posted by Manoj Awasthi <aw...@gmail.com> on 2016/07/08 03:13:52 UTC, 1 replies.
- Any ways to connect BI tool to Spark without Hive - posted by Chanh Le <gi...@gmail.com> on 2016/07/08 03:19:13 UTC, 8 replies.
- Re: Custom Spark Error on Hadoop Cluster - posted by Xiangrui Meng <me...@databricks.com> on 2016/07/08 05:32:01 UTC, 2 replies.
- 回复：Re: Re: how to select first 50 value of each group after group by? - posted by lu...@sina.com on 2016/07/08 06:20:29 UTC, 0 replies.
- Memory grows exponentially - posted by "aasish.kumar" <aa...@avekshaa.com> on 2016/07/08 06:56:51 UTC, 2 replies.
- Bug about reading parquet files - posted by Sea <26...@qq.com> on 2016/07/08 08:33:10 UTC, 1 replies.
- How to improve the performance for writing a data frame to a JDBC database? - posted by Mungeol Heo <mu...@gmail.com> on 2016/07/08 09:23:07 UTC, 0 replies.
- Why is KafkaUtils.createRDD offsetRanges an Array rather than a Seq? - posted by Mikael Ståldal <mi...@magine.com> on 2016/07/08 09:42:08 UTC, 2 replies.
- Is the operation inside foreachRDD supposed to be blocking? - posted by Mikael Ståldal <mi...@magine.com> on 2016/07/08 09:43:12 UTC, 4 replies.
- - posted by tan shai <ta...@gmail.com> on 2016/07/08 10:58:27 UTC, 0 replies.
- RangePartitioning - posted by tan shai <ta...@gmail.com> on 2016/07/08 11:39:14 UTC, 0 replies.
- Simultaneous spark Jobs execution. - posted by Mazen <ma...@gmail.com> on 2016/07/08 12:03:11 UTC, 2 replies.
- 回复： Bug about reading parquet files - posted by Sea <26...@qq.com> on 2016/07/08 12:44:46 UTC, 1 replies.
- spark logging best practices - posted by vimal dinakaran <vi...@gmail.com> on 2016/07/08 12:56:36 UTC, 0 replies.
- Spark Terasort Help - posted by Punit Naik <na...@gmail.com> on 2016/07/08 15:57:17 UTC, 0 replies.
- Iterate over columns in sql.dataframe - posted by Pasquinell Urbani <pa...@exalitica.com> on 2016/07/08 16:08:55 UTC, 0 replies.
- can I use ExectorService in my driver? was: is dataframe.write() async? Streaming performance problem - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/07/08 17:29:12 UTC, 1 replies.
- Unresponsive Spark Streaming UI in YARN cluster mode - 1.5.2 - posted by "Ellis, Tom (Financial Markets IT)" <To...@LloydsBanking.com.INVALID> on 2016/07/08 17:30:11 UTC, 4 replies.
- DataFrame Min By Column - posted by Pedro Rodriguez <sk...@gmail.com> on 2016/07/08 19:57:10 UTC, 6 replies.
- Isotonic Regression, run method overloaded Error - posted by dsp <du...@gmail.com> on 2016/07/08 20:38:15 UTC, 4 replies.
- Broadcast hash join implementation in Spark - posted by Lalitha MV <la...@gmail.com> on 2016/07/08 23:50:27 UTC, 2 replies.
- Spark performance testing - posted by Andrew Ehrlich <an...@aehrlich.com> on 2016/07/09 03:40:58 UTC, 3 replies.
- Is there a way to dynamic load files [ parquet or csv ] in the map function? - posted by charles li <ch...@gmail.com> on 2016/07/09 03:52:23 UTC, 1 replies.
- Re: Spark 2.0 Release Date - posted by "Taotao.Li" <ch...@gmail.com> on 2016/07/09 11:02:40 UTC, 0 replies.
- problem making Zeppelin 0.6 work with Spark 1.6.1, throwing jackson.databind.JsonMappingException exception - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/07/09 14:25:14 UTC, 3 replies.
- Spark application Runtime Measurement - posted by Fei Hu <hu...@gmail.com> on 2016/07/10 03:57:58 UTC, 2 replies.
- location of a partition in the cluster/ how parallelize method distribute the RDD partitions over the cluster. - posted by Mazen <ma...@gmail.com> on 2016/07/10 12:58:16 UTC, 1 replies.
- IS NOT NULL is not working in programmatic SQL in spark - posted by Radha krishna <gr...@gmail.com> on 2016/07/10 14:19:48 UTC, 5 replies.
- How to Register Permanent User-Defined-Functions (UDFs) in SparkSQL - posted by Lokesh Yadav <lo...@gmail.com> on 2016/07/10 16:14:02 UTC, 1 replies.
- KEYS file? - posted by Phil Steitz <ph...@gmail.com> on 2016/07/10 16:57:24 UTC, 5 replies.
- Network issue on deployment - posted by Jean Georges Perrin <jg...@jgp.net> on 2016/07/10 17:26:27 UTC, 2 replies.
- "client / server" config - posted by Jean Georges Perrin <jg...@jgp.net> on 2016/07/11 02:33:37 UTC, 0 replies.
- Spark crashes with two parquet files - posted by Javier Rey <jr...@gmail.com> on 2016/07/11 02:42:40 UTC, 2 replies.
- How to run Zeppelin and Spark Thrift Server Together - posted by Chanh Le <gi...@gmail.com> on 2016/07/11 02:48:26 UTC, 10 replies.
- Re: "client / server" config - posted by ayan guha <gu...@gmail.com> on 2016/07/11 02:55:56 UTC, 2 replies.
- mllib based on dataset or dataframe - posted by jinhong lu <lu...@gmail.com> on 2016/07/11 03:35:51 UTC, 1 replies.
- Spark logging - posted by SamyaMaiti <sa...@gmail.com> on 2016/07/11 03:46:34 UTC, 0 replies.
- Problem connecting Zeppelin 0.6 to Spark Thrift Server - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/07/11 04:48:49 UTC, 4 replies.
- Re: Connection via JDBC to Oracle hangs after count call - posted by Mark Vervuurt <m....@gmail.com> on 2016/07/11 07:25:43 UTC, 3 replies.
- Zeppelin Spark with Dynamic Allocation - posted by Chanh Le <gi...@gmail.com> on 2016/07/11 08:09:52 UTC, 2 replies.
- question about UDAF - posted by lu...@sina.com on 2016/07/11 09:04:52 UTC, 1 replies.
- Spark job state is EXITED but does not return - posted by "Balachandar R.A." <ba...@gmail.com> on 2016/07/11 12:07:40 UTC, 0 replies.
- Re: Using Spark on Hive with Hive also using Spark as its execution engine - posted by Ashok Kumar <as...@yahoo.com.INVALID> on 2016/07/11 14:22:58 UTC, 14 replies.
- Spark hangs at "Removed broadcast_*" - posted by velvetbaldmime <ke...@gmail.com> on 2016/07/11 14:50:21 UTC, 3 replies.
- spark UI what does storage memory x/y mean - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/07/11 15:52:59 UTC, 0 replies.
- WARN FileOutputCommitter: Failed to delete the temporary output directory of task: attempt_201607111453_128606_m_000000_0 - s3n:// - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/07/11 16:00:39 UTC, 0 replies.
- Marking files as read in Spark Streaming - posted by soumick dasgupta <so...@gmail.com> on 2016/07/11 16:24:22 UTC, 0 replies.
- What is the maximum number of column being supported by apache spark dataframe - posted by Zijing Guo <al...@yahoo.com.INVALID> on 2016/07/11 16:32:19 UTC, 0 replies.
- Using accumulators in Local mode for testing - posted by harelglik <ha...@gmail.com> on 2016/07/11 16:42:26 UTC, 0 replies.
- Long-running tree-aggregation causes java.lang.OutOfMemoryError: Metaspace - posted by Daniel Imberman <da...@gmail.com> on 2016/07/11 16:46:28 UTC, 2 replies.
- Question on Spark shell - posted by Sivakumaran S <si...@me.com> on 2016/07/11 17:47:27 UTC, 4 replies.
- Saving Table with Special Characters in Columns - posted by Tobi Bosede <an...@gmail.com> on 2016/07/11 18:16:16 UTC, 2 replies.
- Run Stored Procedures from Spark SqlContext - posted by zachkirsch <t-...@microsoft.com> on 2016/07/11 18:54:44 UTC, 0 replies.
- trouble accessing driver log files using rest-api - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/07/11 20:14:44 UTC, 0 replies.
- Spark SQL: Merge Arrays/Sets - posted by Pedro Rodriguez <sk...@gmail.com> on 2016/07/11 20:40:39 UTC, 2 replies.
- Processing ion formatted messages in spark - posted by pandees waran <pa...@gmail.com> on 2016/07/11 20:41:07 UTC, 0 replies.
- Spark Streaming - Direct Approach - posted by "Mail.com" <pr...@mail.com> on 2016/07/11 20:43:05 UTC, 3 replies.
- Error starting thrift server on Spark - posted by Marco Colombo <in...@gmail.com> on 2016/07/11 21:07:25 UTC, 1 replies.
- /spark-ec2 script: trouble using ganglia web ui spark 1.6.1 - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/07/11 21:49:37 UTC, 0 replies.
- Spark cluster tuning recommendation - posted by Kartik Mathur <ka...@bluedata.com> on 2016/07/11 22:09:40 UTC, 3 replies.
- QuantileDiscretizer not working properly with big dataframes - posted by Pasquinell Urbani <pa...@exalitica.com> on 2016/07/11 22:28:14 UTC, 2 replies.
- chisqSelector in Python - posted by Tobi Bosede <an...@gmail.com> on 2016/07/11 23:12:43 UTC, 0 replies.
- Re: Batch details are missing - posted by "C. Josephson" <cj...@uhana.io> on 2016/07/11 23:52:14 UTC, 0 replies.
- Complications with saving Kafka offsets? - posted by BradleyUM <br...@unionmetrics.com> on 2016/07/12 02:58:37 UTC, 1 replies.
- Re: Fast database with writes per second and horizontal scaling - posted by Ashok Kumar <as...@yahoo.com.INVALID> on 2016/07/12 05:01:32 UTC, 3 replies.
- Matrix Factorization Model model.save error "NullPointerException" - posted by "Zhou (Joe) Xing" <jo...@nextev.com> on 2016/07/12 06:27:50 UTC, 1 replies.
- Spark streaming graceful shutdown when running on yarn-cluster deploy-mode - posted by Guy Harmach <Gu...@Amdocs.com> on 2016/07/12 06:56:58 UTC, 0 replies.
- 回复：Re: question about UDAF - posted by lu...@sina.com on 2016/07/12 08:10:05 UTC, 0 replies.
- Handling categorical variables in StreamingLogisticRegressionwithSGD - posted by kundan kumar <ii...@gmail.com> on 2016/07/12 09:21:06 UTC, 2 replies.
- Error in Spark job - posted by Saurav Sinha <sa...@gmail.com> on 2016/07/12 10:08:32 UTC, 1 replies.
- Large files with wholetextfile() - posted by Bahubali Jain <ba...@gmail.com> on 2016/07/12 12:54:28 UTC, 2 replies.
- Send real-time alert using Spark - posted by Priya Ch <le...@gmail.com> on 2016/07/12 13:25:00 UTC, 2 replies.
- ml and mllib persistence - posted by "aka.fe2s" <ak...@gmail.com> on 2016/07/12 14:40:36 UTC, 2 replies.
- RDD for loop vs foreach - posted by philipghu <ph...@gmail.com> on 2016/07/12 15:58:31 UTC, 2 replies.
- Feature importance IN random forest - posted by pseudo oduesp <ps...@gmail.com> on 2016/07/12 17:30:34 UTC, 1 replies.
- Output Op Duration vs Job Duration: What's the difference? - posted by Renxia Wang <re...@gmail.com> on 2016/07/12 17:35:13 UTC, 0 replies.
- Re: bisecting kmeans model tree - posted by roni <ro...@gmail.com> on 2016/07/12 18:45:22 UTC, 1 replies.
- Tools for Balancing Partitions by Size - posted by Pedro Rodriguez <sk...@gmail.com> on 2016/07/12 19:53:33 UTC, 5 replies.
- 回复： Spark hangs at "Removed broadcast_*" - posted by Sea <26...@qq.com> on 2016/07/13 03:04:42 UTC, 0 replies.
- Inode for STS - posted by ayan guha <gu...@gmail.com> on 2016/07/13 03:54:57 UTC, 4 replies.
- Re: Spark cache behaviour when the source table is modified - posted by Chanh Le <gi...@gmail.com> on 2016/07/13 04:04:03 UTC, 0 replies.
- Spark Streaming: Refreshing broadcast value after each batch - posted by Daniel Haviv <da...@veracity-group.com> on 2016/07/13 05:19:01 UTC, 0 replies.
- Any Idea about this error : IllegalArgumentException: File segment length cannot be negative ? - posted by Dibyendu Bhattacharya <di...@gmail.com> on 2016/07/13 06:26:49 UTC, 0 replies.
- Spark Thrift Server performance - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/07/13 07:33:43 UTC, 4 replies.
- Problem saving Hive table with Overwrite mode - posted by nimrodo <ni...@veracity-group.com> on 2016/07/13 07:35:11 UTC, 0 replies.
- Spark, Kryo Serialization Issue with ProtoBuf field - posted by Nkechi Achara <nk...@googlemail.com> on 2016/07/13 08:30:13 UTC, 0 replies.
- Re: Issue with Spark on 25 nodes cluster - posted by ANDREA SPINA <74...@studenti.unimore.it> on 2016/07/13 09:20:01 UTC, 0 replies.
- Flume integration - posted by Ian Brooks <i....@sensewhere.com> on 2016/07/13 10:13:26 UTC, 0 replies.
- Dependencies with runing Spark Streaming on Mesos cluster using Python - posted by Luke Adolph <ke...@gmail.com> on 2016/07/13 10:57:03 UTC, 2 replies.
- Spark 7736 - posted by ayan guha <gu...@gmail.com> on 2016/07/13 11:47:00 UTC, 1 replies.
- When worker is killed driver continues to run causing issues in supervise mode - posted by Noorul Islam K M <no...@noorul.com> on 2016/07/13 12:08:31 UTC, 1 replies.
- Online evaluation of MLLIB model - posted by Danilo Rizzo <da...@gmail.com> on 2016/07/13 15:57:08 UTC, 1 replies.
- Issue in spark job. Remote rpc client dissociated - posted by "Balachandar R.A." <ba...@gmail.com> on 2016/07/13 16:44:48 UTC, 4 replies.
- Spark Website - posted by Benjamin Kim <bb...@gmail.com> on 2016/07/13 18:45:07 UTC, 5 replies.
- Re: Severe Spark Streaming performance degradation after upgrading to 1.6.1 - posted by Sunita <su...@gmail.com> on 2016/07/13 19:23:10 UTC, 3 replies.
- Structured Streaming and Microbatches - posted by Matthias Niehoff <ma...@codecentric.de> on 2016/07/13 20:35:13 UTC, 1 replies.
- Dense Vectors outputs in feature engineering - posted by rachmaninovquartet <ra...@gmail.com> on 2016/07/13 21:37:32 UTC, 4 replies.
- Spark HBase bulk load using hfile format - posted by yeshwanth kumar <ye...@gmail.com> on 2016/07/13 23:02:28 UTC, 0 replies.
- Doing record linkage using string comparators in Spark - posted by Linh Tran <li...@apple.com> on 2016/07/13 23:43:18 UTC, 0 replies.
- Memory issue java.lang.OutOfMemoryError: Java heap space - posted by Jean Georges Perrin <jg...@jgp.net> on 2016/07/14 01:14:46 UTC, 6 replies.
- Is it possible to send CSVSink metrics to HDFS - posted by johnbutcher <j....@se15.qmul.ac.uk> on 2016/07/14 11:09:59 UTC, 0 replies.
- ranks and cubes - posted by talgr <ta...@gmail.com> on 2016/07/14 14:32:15 UTC, 1 replies.
- Call http request from within Spark - posted by Amit Dutta <am...@outlook.com> on 2016/07/14 14:52:55 UTC, 3 replies.
- Standalone cluster node utilization - posted by Jakub Stransky <st...@gmail.com> on 2016/07/14 16:18:49 UTC, 8 replies.
- Difference JavaReceiverInputDStream and JavaDStream - posted by Paolo Patierno <pp...@live.com> on 2016/07/14 16:20:23 UTC, 0 replies.
- repartitionAndSortWithinPartitions HELP - posted by Punit Naik <na...@gmail.com> on 2016/07/14 17:09:30 UTC, 9 replies.
- Spark Streaming Kinesis Performance Decrease When Cluster Scale Up with More Executors - posted by Renxia Wang <re...@gmail.com> on 2016/07/14 17:49:12 UTC, 3 replies.
- HiveThriftServer and spark.sql.hive.thriftServer.singleSession setting - posted by Chang Lim <ch...@gmail.com> on 2016/07/14 19:08:43 UTC, 2 replies.
- Filtering RDD Using Spark.mllib's ChiSqSelector - posted by Tobi Bosede <an...@gmail.com> on 2016/07/14 20:23:38 UTC, 4 replies.
- Maximum Size of Reference Look Up Table in Spark - posted by Saravanan Subramanian <to...@yahoo.com.INVALID> on 2016/07/14 21:32:09 UTC, 3 replies.
- SparkStreaming multiple output operations failure semantics / error propagation - posted by Martin Eden <ma...@gmail.com> on 2016/07/14 22:04:37 UTC, 0 replies.
- Saving data frames on Spark Master/Driver - posted by "vr.n. nachiappan" <na...@yahoo.com.INVALID> on 2016/07/14 22:15:40 UTC, 4 replies.
- How to check if a data frame is cached? - posted by Cesar <ce...@gmail.com> on 2016/07/15 00:33:49 UTC, 0 replies.
- How to recommend most similar users using Spark ML - posted by jeremycod <zo...@gmail.com> on 2016/07/15 03:36:52 UTC, 2 replies.
- find two consective points - posted by Divya Gehlot <di...@gmail.com> on 2016/07/15 06:36:59 UTC, 0 replies.
- Getting error in inputfile | inputFile - posted by RK Spark <rk...@gmail.com> on 2016/07/15 06:42:47 UTC, 2 replies.
- scala.MatchError on stand-alone cluster mode - posted by Mekal Zheng <me...@gmail.com> on 2016/07/15 07:17:17 UTC, 2 replies.
- Input path does not exist error in giving input file for word count program - posted by RK Spark <rk...@gmail.com> on 2016/07/15 08:05:36 UTC, 1 replies.
- Random Forest Job got killed (DAGScheduler: failed: Set() , DecisionTree.scala:642), which has no missing parents) - posted by Ascot Moss <as...@gmail.com> on 2016/07/15 12:02:54 UTC, 0 replies.
- Random Forest gererate model failed (DecisionTree.scala:642), which has no missing parents - posted by Ascot Moss <as...@gmail.com> on 2016/07/15 12:14:17 UTC, 0 replies.
- spark.executor.cores - posted by Jean Georges Perrin <jg...@jgp.net> on 2016/07/15 12:31:02 UTC, 10 replies.
- XML - posted by "VON RUEDEN, Jonathan" <jo...@sap.com> on 2016/07/15 12:44:17 UTC, 0 replies.
- Error starting HiveServer2: could not start ThriftBinaryCLIService - posted by ram kumar <ra...@gmail.com> on 2016/07/15 13:24:08 UTC, 0 replies.
- How to verify in Spark 1.6.x usage, User Memory used after Cache table - posted by Yogesh Rajak <yr...@infocepts.com> on 2016/07/15 13:44:35 UTC, 0 replies.
- standalone mode only supports FIFO scheduler across applications ? still in spark 2.0 time ? - posted by Teng Qiu <te...@gmail.com> on 2016/07/15 16:15:28 UTC, 2 replies.
- many 'activity' job are pending - posted by 陆巍|Wei Lu（RD） <we...@99bill.com> on 2016/07/15 16:17:41 UTC, 0 replies.
- Custom InputFormat (SequenceFileInputFormat vs FileInputFormat) - posted by jtgenesis <jt...@gmail.com> on 2016/07/15 17:31:44 UTC, 1 replies.
- spark single PROCESS_LOCAL task - posted by Matt K <ma...@gmail.com> on 2016/07/15 17:57:02 UTC, 1 replies.
- How to convert from DataFrame to Dataset[Row]? - posted by Daniel Barclay <da...@gmail.com> on 2016/07/15 19:21:42 UTC, 2 replies.
- How can we control CPU and Memory per Spark job operation.. - posted by Pavan Achanta <pa...@sysomos.com> on 2016/07/15 19:54:44 UTC, 4 replies.
- Streaming from Kinesis is not getting data in Yarn cluster - posted by dharmendra <d1...@gmail.com> on 2016/07/15 20:04:54 UTC, 1 replies.
- Re: java.lang.OutOfMemoryError related to Graphframe bfs - posted by RK Aduri <rk...@collectivei.com> on 2016/07/15 23:57:52 UTC, 0 replies.
- Re: Spark Streaming - Best Practices to handle multiple datapoints arriving at different time interval - posted by RK Aduri <rk...@collectivei.com> on 2016/07/16 00:02:48 UTC, 1 replies.
- Size of cached dataframe - posted by Brandon White <bw...@gmail.com> on 2016/07/16 03:18:51 UTC, 0 replies.
- Spark streaming takes longer time to read json into dataframes - posted by Diwakar Dhanuskodi <di...@gmail.com> on 2016/07/16 03:43:54 UTC, 7 replies.
- Trouble while running spark at ec2 cluster - posted by Hassaan Chaudhry <mh...@gmail.com> on 2016/07/16 04:32:56 UTC, 1 replies.
- Latest 200 messages per topic - posted by Rabin Banerjee <de...@gmail.com> on 2016/07/16 15:38:31 UTC, 4 replies.
- High availability with Spark - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2016/07/16 16:13:41 UTC, 0 replies.
- Fwd: File to read sharded (2 levels) parquet files - posted by Pei Sun <pe...@alluxio.com> on 2016/07/16 21:01:25 UTC, 0 replies.
- Spark (on Windows) not picking up HADOOP_CONF_DIR - posted by Daniel Haviv <da...@veracity-group.com> on 2016/07/17 09:33:08 UTC, 1 replies.
- unsubscribe - posted by "Burger, Robert" <Ro...@td.com> on 2016/07/17 12:43:12 UTC, 3 replies.
- Dataframe Transformation with Inner fields in Complex Datatypes. - posted by java bigdata <ha...@gmail.com> on 2016/07/17 16:16:47 UTC, 2 replies.
- How to use Spark scala custom UDF in spark sql CLI or beeline client - posted by pooja mehta <sp...@gmail.com> on 2016/07/17 16:37:45 UTC, 0 replies.
- Error in using filter when using dataset API in java - posted by VG <vl...@gmail.com> on 2016/07/17 16:57:14 UTC, 1 replies.
- Question About OFF_HEAP Caching - posted by condor join <sp...@outlook.com> on 2016/07/18 07:11:45 UTC, 0 replies.
- Spark Job trigger in production - posted by manish jaiswal <ma...@gmail.com> on 2016/07/18 07:13:19 UTC, 5 replies.
- how to tuning spark shuffle - posted by leezy <li...@163.com> on 2016/07/18 07:16:17 UTC, 2 replies.
- Dynamically get value based on Map key in Spark Dataframe - posted by Divya Gehlot <di...@gmail.com> on 2016/07/18 07:23:34 UTC, 3 replies.
- pyspark 1.5 0 save model ? - posted by pseudo oduesp <ps...@gmail.com> on 2016/07/18 09:58:10 UTC, 1 replies.
- the spark job is so slow - almost frozen - posted by Zhiliang Zhu <zc...@yahoo.com.INVALID> on 2016/07/18 10:04:34 UTC, 6 replies.
- Spark driver getting out of memory - posted by Saurav Sinha <sa...@gmail.com> on 2016/07/18 10:31:34 UTC, 8 replies.
- Concatenate the columns in dataframe to create new collumns using Java - posted by Abhishek Anand <ab...@gmail.com> on 2016/07/18 10:45:18 UTC, 4 replies.
- Re: Question About OFF_HEAP Caching - posted by Gene Pang <ge...@gmail.com> on 2016/07/18 13:36:19 UTC, 1 replies.
- transtition SQLContext to SparkSession - posted by Koert Kuipers <ko...@tresata.com> on 2016/07/18 15:37:42 UTC, 5 replies.
- Increasing spark.yarn.executor.memoryOverhead degrades performance - posted by Sunita Arvind <su...@gmail.com> on 2016/07/18 15:47:53 UTC, 1 replies.
- spark-submit local and Akka startup timeouts - posted by Rory Waite <rw...@sdl.com> on 2016/07/18 16:34:04 UTC, 3 replies.
- Execute function once on each node - posted by joshuata <jo...@gmail.com> on 2016/07/18 21:57:42 UTC, 7 replies.
- ApacheCon: Getting the word out internally - posted by Melissa Warnkin <mi...@yahoo.com.INVALID> on 2016/07/18 23:15:09 UTC, 0 replies.
- Unsubscribe - posted by Jinan Alhajjaj <j....@hotmail.com> on 2016/07/19 03:33:56 UTC, 3 replies.
- the spark job is so slow during shuffle - almost frozen - posted by Zhiliang Zhu <zc...@yahoo.com.INVALID> on 2016/07/19 03:52:27 UTC, 0 replies.
- Scala code as "spark view" - posted by wdaehn <we...@sap.com> on 2016/07/19 08:26:21 UTC, 0 replies.
- which one spark ml or spark mllib - posted by pseudo oduesp <ps...@gmail.com> on 2016/07/19 09:55:19 UTC, 1 replies.
- I'm trying to understand how to compile Spark - posted by Eli Super <el...@gmail.com> on 2016/07/19 11:22:38 UTC, 3 replies.
- Error in Word Count Program - posted by RK Spark <rk...@gmail.com> on 2016/07/19 11:30:30 UTC, 1 replies.
- Spark ResourceLeak?? - posted by Guruji <sa...@gmail.com> on 2016/07/19 13:11:06 UTC, 1 replies.
- Spark ResourceLeak? - posted by saurabh guru <sa...@gmail.com> on 2016/07/19 13:12:19 UTC, 0 replies.
- Building standalone spark application via sbt - posted by Sachin Mittal <sj...@gmail.com> on 2016/07/19 14:09:58 UTC, 7 replies.
- Is it good choice to use DAO to store results generated by spark application? - posted by Yu Wei <yu...@hotmail.com> on 2016/07/19 16:34:15 UTC, 10 replies.
- Re: how to setup the development environment of spark with IntelliJ on ubuntu - posted by joshuata <jo...@gmail.com> on 2016/07/19 18:25:08 UTC, 0 replies.
- Strange behavior including memory leak and NPE - posted by rachmaninovquartet <ra...@gmail.com> on 2016/07/19 18:53:31 UTC, 0 replies.
- Little idea needed - posted by Aakash Basu <ra...@gmail.com> on 2016/07/19 19:27:57 UTC, 6 replies.
- Re: Task not serializable: java.io.NotSerializableException: org.json4s.Serialization$$anon$1 - posted by joshuata <jo...@gmail.com> on 2016/07/19 20:51:25 UTC, 1 replies.
- Role-based S3 access outside of EMR - posted by Everett Anderson <ev...@nuna.com.INVALID> on 2016/07/19 21:30:10 UTC, 9 replies.
- Missing Exector Logs From Yarn After Spark Failure - posted by Rachana Srivastava <Ra...@markmonitor.com> on 2016/07/19 22:23:20 UTC, 2 replies.
- Saving a pyspark.ml.feature.PCA model - posted by Ajinkya Kale <ka...@gmail.com> on 2016/07/19 22:54:06 UTC, 5 replies.
- Heavy Stage Concentration - Ends With Failure - posted by Aaron Jackson <aj...@pobox.com> on 2016/07/20 00:16:47 UTC, 2 replies.
- HiveContext , difficulties in accessing tables in hive schema's/database's other than default database. - posted by satyajit vegesna <sa...@gmail.com> on 2016/07/20 01:23:58 UTC, 0 replies.
- spark worker continuously trying to connect to master and failed in standalone mode - posted by Neil Chang <ia...@gmail.com> on 2016/07/20 01:25:13 UTC, 3 replies.
- Should it be safe to embed Spark in Local Mode? - posted by Brett Randall <ja...@gmail.com> on 2016/07/20 02:35:01 UTC, 1 replies.
- Storm HDFS bolt equivalent in Spark Streaming. - posted by Ra...@DellTeam.com on 2016/07/20 04:18:40 UTC, 3 replies.
- Extremely slow shuffle writes and large job time fluxuations - posted by Jon Chase <jo...@gmail.com> on 2016/07/20 04:30:59 UTC, 0 replies.
- Running multiple Spark Jobs on Yarn( Client mode) - posted by Vaibhav Nagpal <va...@gmail.com> on 2016/07/20 06:00:16 UTC, 2 replies.
- write and call UDF in spark dataframe - posted by Divya Gehlot <di...@gmail.com> on 2016/07/20 07:14:47 UTC, 10 replies.
- How spark decides whether to do BroadcastHashJoin or SortMergeJoin - posted by raaggarw <ra...@adobe.com> on 2016/07/20 08:07:42 UTC, 1 replies.
- XLConnect in SparkR - posted by Yogesh Vyas <in...@gmail.com> on 2016/07/20 08:42:36 UTC, 3 replies.
- run spark apps in linux crontab - posted by lu...@sina.com on 2016/07/20 10:00:37 UTC, 7 replies.
- lift coefficien - posted by pseudo oduesp <ps...@gmail.com> on 2016/07/20 13:05:19 UTC, 1 replies.
- Spark 1.6.2 Spark-SQL RACK_LOCAL - posted by chandana <ch...@gmail.com> on 2016/07/20 13:45:37 UTC, 0 replies.
- Snappy initialization issue, spark assembly jar missing snappy classes? - posted by Eugene Morozov <ev...@gmail.com> on 2016/07/20 14:01:43 UTC, 0 replies.
- ML PipelineModel to be scored locally - posted by Simone Miraglia <si...@gmail.com> on 2016/07/20 14:08:41 UTC, 3 replies.
- Best practices to restart Spark jobs programatically from driver itself - posted by unk1102 <um...@gmail.com> on 2016/07/20 15:11:53 UTC, 1 replies.
- difference between two consecutive rows of same column + spark + dataframe - posted by Divya Gehlot <di...@gmail.com> on 2016/07/20 15:50:43 UTC, 0 replies.
- How to connect HBase and Spark using Python? - posted by Def_Os <nj...@gmail.com> on 2016/07/20 16:32:45 UTC, 4 replies.
- RandomForestClassifier - posted by pseudo oduesp <ps...@gmail.com> on 2016/07/20 17:03:24 UTC, 1 replies.
- Understanding spark concepts cluster, master, slave, job, stage, worker, executor, task - posted by Sachin Mittal <sj...@gmail.com> on 2016/07/20 17:30:59 UTC, 15 replies.
- Re: OutOfMemory when doing joins in spark 2.0 while same code runs fine in spark 1.5.2 - posted by Ian O'Connell <ia...@ianoconnell.com> on 2016/07/20 17:35:03 UTC, 1 replies.
- Attribute name "sum(proceeds)" contains invalid character(s) among " ,;{}()\n\t=" - posted by Chanh Le <gi...@gmail.com> on 2016/07/20 17:39:39 UTC, 0 replies.
- MultiThreading in Spark 1.6.0 - posted by RK Aduri <rk...@collectivei.com> on 2016/07/20 18:32:01 UTC, 2 replies.
- Using multiple data sources in one stream - posted by Joe Panciera <jo...@gmail.com> on 2016/07/20 19:54:43 UTC, 0 replies.
- SparkWebUI and Master URL on EC2 - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2016/07/20 20:09:13 UTC, 2 replies.
- PySpark 2.0 Structured Streaming Question - posted by "A.W. Covert III" <co...@gmail.com> on 2016/07/20 20:22:38 UTC, 1 replies.
- Re: Subquery in having-clause (Spark 1.1.0) - posted by rickn <na...@gmail.com> on 2016/07/20 23:13:00 UTC, 0 replies.
- Understanding Spark UI DAGs - posted by "C. Josephson" <cj...@uhana.io> on 2016/07/21 00:56:18 UTC, 5 replies.
- 回复：Re:run spark apps in linux crontab - posted by lu...@sina.com on 2016/07/21 03:36:42 UTC, 0 replies.
- getting null when calculating time diff with unix_timestamp + spark 1.6 - posted by Divya Gehlot <di...@gmail.com> on 2016/07/21 03:50:52 UTC, 1 replies.
- calculate time difference between consecutive rows - posted by Divya Gehlot <di...@gmail.com> on 2016/07/21 04:27:25 UTC, 2 replies.
- Optimal Amount of Tasks Per size of data in memory - posted by Brandon White <bw...@gmail.com> on 2016/07/21 05:58:19 UTC, 0 replies.
- Reading multiple json files form nested folders for data frame - posted by Ashutosh Kumar <km...@gmail.com> on 2016/07/21 06:18:59 UTC, 5 replies.
- 回复：Re: run spark apps in linux crontab - posted by lu...@sina.com on 2016/07/21 06:48:29 UTC, 0 replies.
- Where is the SparkSQL Specification? - posted by Linyuxin <li...@huawei.com> on 2016/07/21 07:20:04 UTC, 1 replies.
- what contribute to Task Deserialization Time - posted by patcharee <Pa...@uni.no> on 2016/07/21 09:35:56 UTC, 2 replies.
- writing Kafka dstream to local flat file - posted by Puneet Tripathi <Pu...@dunnhumby.com> on 2016/07/21 10:46:49 UTC, 0 replies.
- Using RDD.checkpoint to recover app failure - posted by harelglik <ha...@gmail.com> on 2016/07/21 11:56:42 UTC, 1 replies.
- HiveThriftServer2.startWithContext no more showing tables in 1.6.2 - posted by Marco Colombo <in...@gmail.com> on 2016/07/21 13:55:11 UTC, 3 replies.
- init() and cleanup() for Spark map functions - posted by Amit Sela <am...@gmail.com> on 2016/07/21 14:11:45 UTC, 0 replies.
- Upgrading a Hive External Storage Handler... - posted by "Lavelle, Shawn" <Sh...@osii.com> on 2016/07/21 14:53:36 UTC, 0 replies.
- Load selected rows with sqlContext in the dataframe - posted by sujeet jog <su...@gmail.com> on 2016/07/21 14:59:15 UTC, 2 replies.
- spark and plot data - posted by pseudo oduesp <ps...@gmail.com> on 2016/07/21 15:30:23 UTC, 11 replies.
- add hours to from_unixtimestamp - posted by Divya Gehlot <di...@gmail.com> on 2016/07/21 16:34:52 UTC, 0 replies.
- Programmatic use of UDFs from Java - posted by Everett Anderson <ev...@nuna.com.INVALID> on 2016/07/21 17:10:40 UTC, 3 replies.
- spark.driver.extraJavaOptions - posted by SamyaMaiti <sa...@gmail.com> on 2016/07/21 17:10:46 UTC, 3 replies.
- how to resolve you must build spark with hive exception? - posted by Nomii5007 <in...@gmail.com> on 2016/07/21 17:58:48 UTC, 0 replies.
- Number of sortBy output partitions - posted by Simone Franzini <ca...@gmail.com> on 2016/07/21 20:58:14 UTC, 0 replies.
- Upgrade from 1.2 to 1.6 - parsing flat files in working directory - posted by Sumona Routh <su...@gmail.com> on 2016/07/21 22:43:08 UTC, 1 replies.
- How to submit app in cluster mode? port 7077 or 6066 - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/07/21 22:44:03 UTC, 2 replies.
- SVD output within Spark - posted by Martin Somers <so...@gmail.com> on 2016/07/21 22:50:08 UTC, 0 replies.
- NoClassDefFoundError with ZonedDateTime - posted by Ilya Ganelin <il...@gmail.com> on 2016/07/22 03:26:22 UTC, 5 replies.
- MLlib, Java, and DataFrame - posted by Jean Georges Perrin <jg...@jgp.net> on 2016/07/22 03:41:47 UTC, 11 replies.
- Applying schema on single column dataframe in java - posted by raheel-akl <ra...@gmail.com> on 2016/07/22 03:54:24 UTC, 0 replies.
- Re: GraphX performance and settings - posted by B YL <by...@hotmail.com> on 2016/07/22 07:48:17 UTC, 1 replies.
- ml models distribution - posted by Sergio Fernández <wi...@apache.org> on 2016/07/22 09:49:49 UTC, 7 replies.
- Create dataframe column from list - posted by Divya Gehlot <di...@gmail.com> on 2016/07/22 11:45:31 UTC, 3 replies.
- Unresolved dependencies while creating spark application Jar - posted by janardhan shetty <ja...@gmail.com> on 2016/07/22 12:08:38 UTC, 5 replies.
- Dataset , RDD zipWithIndex -- How to use as a map . - posted by VG <vl...@gmail.com> on 2016/07/22 13:10:15 UTC, 3 replies.
- Is spark-submit a single point of failure? - posted by Sivakumaran S <si...@me.com> on 2016/07/22 13:46:59 UTC, 2 replies.
- ml ALS.fit(..) issue - posted by VG <vl...@gmail.com> on 2016/07/22 14:17:41 UTC, 2 replies.
- WrappedArray in SparkSQL DF - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2016/07/22 14:27:32 UTC, 0 replies.
- Creating a DataFrame from scratch - posted by Jean Georges Perrin <jg...@jgp.net> on 2016/07/22 14:53:08 UTC, 3 replies.
- Rebalancing when adding kafka partitions - posted by Srikanth <sr...@gmail.com> on 2016/07/22 16:29:44 UTC, 5 replies.
- running jupyter notebook server Re: spark and plot data - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/07/22 17:07:32 UTC, 1 replies.
- Error in running JavaALSExample example from spark examples - posted by VG <vl...@gmail.com> on 2016/07/22 17:31:20 UTC, 8 replies.
- Re: Integration tests for Spark Streaming - posted by Lars Albertsson <la...@mapflat.com> on 2016/07/22 17:35:40 UTC, 0 replies.
- How to search on a Dataset / RDD - posted by VG <vl...@gmail.com> on 2016/07/22 18:21:56 UTC, 1 replies.
- Hive Exception - posted by Inam Ur Rehman <in...@gmail.com> on 2016/07/22 18:40:44 UTC, 0 replies.
- Spark, Scala, and DNA sequencing - posted by James McCabe <ja...@oranda.com> on 2016/07/22 19:31:44 UTC, 3 replies.
- Distributed Matrices - spark mllib - posted by Gourav Sengupta <go...@gmail.com> on 2016/07/22 20:14:20 UTC, 1 replies.
- How to get the number of partitions for a SparkDataFrame in Spark 2.0-preview? - posted by Neil Chang <ia...@gmail.com> on 2016/07/22 20:20:12 UTC, 4 replies.
- Fatal error when using broadcast variables and checkpointing in Spark Streaming - posted by Joe Panciera <jo...@gmail.com> on 2016/07/22 20:50:22 UTC, 1 replies.
- Exception in thread "dispatcher-event-loop-1" java.lang.OutOfMemoryError: Java heap space - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/07/22 21:17:35 UTC, 2 replies.
- ERROR Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. - posted by Ascot Moss <as...@gmail.com> on 2016/07/22 22:52:44 UTC, 5 replies.
- Using flatMap on Dataframes with Spark 2.0 - posted by Julien Nauroy <ju...@u-psud.fr> on 2016/07/23 12:36:20 UTC, 5 replies.
- spark context stop vs close - posted by "Mail.com" <pr...@mail.com> on 2016/07/23 13:11:27 UTC, 3 replies.
- Error in collecting RDD as a Map - IOException in collectAsMap - posted by VG <vl...@gmail.com> on 2016/07/23 13:37:29 UTC, 6 replies.
- Choosing RDD/DataFrame/DataSet and Cluster Tuning - posted by Jestin Ma <je...@gmail.com> on 2016/07/23 15:31:07 UTC, 1 replies.
- How to give name to Spark jobs shown in Spark UI - posted by unk1102 <um...@gmail.com> on 2016/07/23 15:47:22 UTC, 3 replies.
- SaveToCassandra executed when I stop Spark - posted by Fernando Avalos <ga...@gmail.com> on 2016/07/23 16:23:51 UTC, 1 replies.
- Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data - posted by VG <vl...@gmail.com> on 2016/07/23 18:37:48 UTC, 10 replies.
- How to generate a sequential key in rdd across executors - posted by yeshwanth kumar <ye...@gmail.com> on 2016/07/24 02:53:22 UTC, 3 replies.
- Size exceeds Integer.MAX_VALUE - posted by Ascot Moss <as...@gmail.com> on 2016/07/24 04:00:10 UTC, 3 replies.
- Maintaining order of pair rdd - posted by janardhan shetty <ja...@gmail.com> on 2016/07/24 05:22:55 UTC, 10 replies.
- Locality sensitive hashing - posted by janardhan shetty <ja...@gmail.com> on 2016/07/24 13:54:33 UTC, 2 replies.
- Restarting Spark Streaming Job periodically - posted by Prashant verma <vp...@gmail.com> on 2016/07/24 15:36:06 UTC, 0 replies.
- UDF to build a Vector? - posted by Jean Georges Perrin <jg...@jgp.net> on 2016/07/24 16:12:27 UTC, 1 replies.
- java.lang.RuntimeException: Unsupported type: vector - posted by Jean Georges Perrin <jg...@jgp.net> on 2016/07/24 16:50:32 UTC, 1 replies.
- Spark 2.0.0 RC 5 -- java.lang.AssertionError: assertion failed: Block rdd_[*] is not locked for reading - posted by Ameen Akel <am...@gmail.com> on 2016/07/24 17:00:47 UTC, 0 replies.
- How to read content of hdfs files - posted by Bhupendra Mishra <bh...@gmail.com> on 2016/07/24 17:29:21 UTC, 0 replies.
- Frequent Item Pattern Spark ML Dataframes - posted by janardhan shetty <ja...@gmail.com> on 2016/07/24 18:18:55 UTC, 1 replies.
- Outer Explode needed - posted by Don Drake <do...@gmail.com> on 2016/07/24 19:18:36 UTC, 1 replies.
- Spark 1.6.2 version displayed as 1.6.1 - posted by Ascot Moss <as...@gmail.com> on 2016/07/24 23:33:25 UTC, 3 replies.
- K-means Evaluation metrics - posted by janardhan shetty <ja...@gmail.com> on 2016/07/25 00:30:51 UTC, 1 replies.
- Bzip2 to Parquet format - posted by janardhan shetty <ja...@gmail.com> on 2016/07/25 00:34:25 UTC, 3 replies.
- [Error] : Save dataframe to csv using Spark-csv in Spark 1.6 - posted by Divya Gehlot <di...@gmail.com> on 2016/07/25 03:22:19 UTC, 0 replies.
- where I can find spark-streaming-kafka for spark2.0 - posted by kevin <ki...@gmail.com> on 2016/07/25 04:05:04 UTC, 6 replies.
- unsubscribe) - posted by Uzi Hadad <uz...@mta.ac.il> on 2016/07/25 05:36:57 UTC, 1 replies.
- Hive and distributed sql engine - posted by Marco Colombo <in...@gmail.com> on 2016/07/25 06:48:10 UTC, 2 replies.
- spark2.0 can't run SqlNetworkWordCount - posted by kevin <ki...@gmail.com> on 2016/07/25 09:33:31 UTC, 0 replies.
- Fwd: PySpark : Filter based on resultant query without additional dataframe - posted by kiran kumar <ku...@gmail.com> on 2016/07/25 11:03:34 UTC, 0 replies.
- read parquetfile in spark-sql error - posted by cj <12...@qq.com> on 2016/07/25 11:08:47 UTC, 3 replies.
- add spark-csv jar to ipython notbook without packages flags - posted by pseudo oduesp <ps...@gmail.com> on 2016/07/25 11:27:20 UTC, 2 replies.
- SPARK SQL and join pipeline issue - posted by "Carlo.Allocca" <ca...@open.ac.uk> on 2016/07/25 12:09:47 UTC, 0 replies.
- jdbcRRD and dataframe - posted by Marco Colombo <in...@gmail.com> on 2016/07/25 14:14:42 UTC, 4 replies.
- Pls assist: Creating Spak EC2 cluster using spark_ec2.py script and a custom AMI - posted by Marco Mistroni <mm...@gmail.com> on 2016/07/25 14:37:40 UTC, 1 replies.
- get hdfs file path in spark - posted by Yang Cao <cy...@gmail.com> on 2016/07/25 15:59:30 UTC, 0 replies.
- Executors assigned to STS and number of workers in Stand Alone Mode - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/07/25 16:17:07 UTC, 11 replies.
- Using DirectOutputCommitter with ORC - posted by Daniel Haviv <da...@veracity-group.com> on 2016/07/25 16:41:48 UTC, 0 replies.
- Performance tuning for standalone on one host - posted by on <sc...@web.de> on 2016/07/25 17:21:46 UTC, 1 replies.
- JavaRDD.foreach (new VoidFunction<>...) always returns the last element - posted by Jia Zou <ja...@gmail.com> on 2016/07/25 17:50:53 UTC, 2 replies.
- Re: Performance tuning for local mode on one host - posted by on <sc...@web.de> on 2016/07/25 18:19:44 UTC, 2 replies.
- Spark 2.0 - posted by Bryan Jeffrey <br...@gmail.com> on 2016/07/25 18:23:30 UTC, 2 replies.
- Check out Kyper! Trying to be Uber of Data - posted by Daniel Lopes <da...@onematch.com.br> on 2016/07/25 20:15:54 UTC, 0 replies.
- SPARK UDF related issue - posted by "Carlo.Allocca" <ca...@open.ac.uk> on 2016/07/25 21:49:59 UTC, 0 replies.
- How to partition a SparkDataFrame using all distinct column values in sparkR - posted by Neil Chang <ia...@gmail.com> on 2016/07/25 22:46:25 UTC, 0 replies.
- Spark SQL overwrite/append for partitioned tables - posted by Pedro Rodriguez <sk...@gmail.com> on 2016/07/25 23:18:17 UTC, 4 replies.
- DAGScheduler: Job 20 finished: collectAsMap at DecisionTree.scala:651, took 19.556700 s Killed - posted by Ascot Moss <as...@gmail.com> on 2016/07/25 23:27:00 UTC, 4 replies.
- Num of executors and cores - posted by "Mail.com" <pr...@mail.com> on 2016/07/26 00:18:49 UTC, 4 replies.
- Potential Change in Kafka's Partition Assignment Semantics when Subscription Changes - posted by Vahid S Hashemian <va...@us.ibm.com> on 2016/07/26 00:20:03 UTC, 2 replies.
- Spark Web UI port 4040 not working - posted by Jestin Ma <je...@gmail.com> on 2016/07/26 01:21:29 UTC, 9 replies.
- Re: Odp.: spark2.0 can't run SqlNetworkWordCount - posted by kevin <ki...@gmail.com> on 2016/07/26 01:59:59 UTC, 0 replies.
- ORC v/s Parquet for Spark 2.0 - posted by janardhan shetty <ja...@gmail.com> on 2016/07/26 02:09:24 UTC, 25 replies.
- UDF returning generic Seq - posted by Chris Beavers <cb...@trifacta.com> on 2016/07/26 02:32:12 UTC, 2 replies.
- spark2.0 how to use sparksession and StreamingContext same time - posted by kevin <ki...@gmail.com> on 2016/07/26 03:25:12 UTC, 2 replies.
- 回复： read parquetfile in spark-sql error - posted by cj <12...@qq.com> on 2016/07/26 04:13:35 UTC, 0 replies.
- yarn.exceptions.ApplicationAttemptNotFoundException when trying to shut down spark applicaiton via yarn applicaiton --kill - posted by Yu Wei <yu...@hotmail.com> on 2016/07/26 06:21:57 UTC, 1 replies.
- dataframe.foreach VS dataframe.collect().foreach - posted by kevin <ki...@gmail.com> on 2016/07/26 07:30:14 UTC, 5 replies.
- PCA machine learning - posted by pseudo oduesp <ps...@gmail.com> on 2016/07/26 08:39:25 UTC, 1 replies.
- Spark streaming lost data when ReceiverTracker writes Blockinfo to hdfs timeout - posted by Andy Zhao <an...@gmail.com> on 2016/07/26 09:45:50 UTC, 1 replies.
- FileUtil.fullyDelete does ? - posted by Divya Gehlot <di...@gmail.com> on 2016/07/26 10:51:39 UTC, 1 replies.
- sbt build under scala - posted by Martin Somers <so...@gmail.com> on 2016/07/26 12:54:38 UTC, 1 replies.
- Event Log Compression - posted by Bryan Jeffrey <br...@gmail.com> on 2016/07/26 13:33:59 UTC, 1 replies.
- Question on set membership / diff sync technique in Spark - posted by Natu Lauchande <nl...@gmail.com> on 2016/07/26 13:45:27 UTC, 0 replies.
- spark sql aggregate function "Nth" - posted by Alex Nastetsky <al...@vervemobile.com> on 2016/07/26 14:57:14 UTC, 2 replies.
- ioStreams for DataFrameReader/Writer - posted by Roger Holenweger <ro...@lotadata.com> on 2016/07/26 15:01:10 UTC, 0 replies.
- sparse vector to dense vecotor in pyspark - posted by pseudo oduesp <ps...@gmail.com> on 2016/07/26 15:03:46 UTC, 0 replies.
- Is RowMatrix missing in org.apache.spark.ml package? - posted by Rohit Chaddha <ro...@gmail.com> on 2016/07/26 17:20:49 UTC, 1 replies.
- getting more concurrency best practices - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/07/26 17:44:55 UTC, 0 replies.
- File System closed while submitting job in spark - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2016/07/26 18:50:41 UTC, 0 replies.
- read only specific jsons - posted by vr spark <vr...@gmail.com> on 2016/07/26 18:51:48 UTC, 4 replies.
- dynamic coalesce to pick file size - posted by Maurin Lenglart <ma...@cuberonlabs.com> on 2016/07/26 19:02:45 UTC, 1 replies.
- libraryDependencies - posted by Martin Somers <so...@gmail.com> on 2016/07/26 19:18:36 UTC, 5 replies.
- Re: how to use spark.mesos.constraints - posted by Jia Yu <ji...@gmail.com> on 2016/07/26 23:10:46 UTC, 1 replies.
- Spark Beginner Question - posted by Shi Yu <sh...@gmail.com> on 2016/07/27 04:37:58 UTC, 1 replies.
- The Future Of DStream - posted by Chang Chen <ba...@gmail.com> on 2016/07/27 05:02:05 UTC, 5 replies.
- Spark 2.0 just released - posted by Chanh Le <gi...@gmail.com> on 2016/07/27 05:10:38 UTC, 0 replies.
- Fail a batch in Spark Streaming forcefully based on business rules - posted by Hemalatha A <he...@googlemail.com> on 2016/07/27 05:12:33 UTC, 2 replies.
- Spark Jobs not getting shown in Spark UI browser - posted by Prashant verma <vp...@gmail.com> on 2016/07/27 05:46:55 UTC, 0 replies.
- Configure Spark to run with MemSQL DB Cluster - posted by Subhajit Purkayastha <sp...@p3si.net> on 2016/07/27 05:54:08 UTC, 1 replies.
- [ANNOUNCE] Announcing Apache Spark 2.0.0 - posted by Reynold Xin <rx...@databricks.com> on 2016/07/27 06:00:22 UTC, 4 replies.
- How to export a project to a JAR in Scala IDE for eclipse Correctly? - posted by lu...@sina.com on 2016/07/27 06:23:26 UTC, 1 replies.
- Setting spark.sql.shuffle.partitions Dynamically - posted by Brandon White <bw...@gmail.com> on 2016/07/27 06:26:03 UTC, 1 replies.
- spark - posted by ناهید بهجتی نجف آبادی <nh...@gmail.com> on 2016/07/27 10:45:26 UTC, 1 replies.
- tpcds for spark2.0 - posted by kevin <ki...@gmail.com> on 2016/07/27 11:06:02 UTC, 1 replies.
- Spark 2.0 SparkSession, SparkConf, SparkContext - posted by Jestin Ma <je...@gmail.com> on 2016/07/27 13:02:43 UTC, 2 replies.
- Possible to push sub-queries down into the DataSource impl? - posted by Timothy Potter <th...@gmail.com> on 2016/07/27 13:59:08 UTC, 3 replies.
- Building Spark 2 from source that does not include the Hive jars - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/07/27 14:55:49 UTC, 0 replies.
- Spark Standalone Cluster: Having a master and worker on the same node - posted by Jestin Ma <je...@gmail.com> on 2016/07/27 17:19:29 UTC, 2 replies.
- Writing custom Transformers and Estimators like Tokenizer in spark ML - posted by janardhan shetty <ja...@gmail.com> on 2016/07/27 17:31:51 UTC, 3 replies.
- spark 1.6.0 read s3 files error. - posted by freedafeng <fr...@yahoo.com> on 2016/07/27 17:36:09 UTC, 5 replies.
- spark-2.0 support for spark-ec2 ? - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/07/27 18:30:36 UTC, 1 replies.
- Spark 2.0 - JavaAFTSurvivalRegressionExample doesn't work - posted by Robert Goodman <bs...@gmail.com> on 2016/07/27 18:33:29 UTC, 2 replies.
- Run times for Spark 1.6.2 compared to 2.1.0? - posted by Colin Beckingham <co...@kingston.net> on 2016/07/27 20:31:51 UTC, 1 replies.
- spark-2.x what is the default version of java ? - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/07/27 20:59:51 UTC, 1 replies.
- how to copy local files to hdfs quickly? - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/07/27 23:25:12 UTC, 1 replies.
- How do I download 2.0? The main download page isn't showing it? - posted by Jim O'Flaherty <ji...@gmail.com> on 2016/07/28 00:11:36 UTC, 2 replies.
- saveAsTextFile at treeEnsembleModels.scala:447, took 2.513396 s Killed - posted by Ascot Moss <as...@gmail.com> on 2016/07/28 00:49:46 UTC, 1 replies.
- DecisionTree currently only supports maxDepth <= 30 - posted by Ascot Moss <as...@gmail.com> on 2016/07/28 00:56:20 UTC, 0 replies.
- A question about Spark Cluster vs Local Mode - posted by Ascot Moss <as...@gmail.com> on 2016/07/28 01:48:13 UTC, 3 replies.
- performance problem when reading lots of small files created by spark streaming. - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/07/28 02:19:43 UTC, 3 replies.
- Spark Thrift Server 2.0 set spark.sql.shuffle.partitions not working when query - posted by Chanh Le <gi...@gmail.com> on 2016/07/28 06:59:57 UTC, 2 replies.
- Any reference of performance tuning on SparkSQL? - posted by Linyuxin <li...@huawei.com> on 2016/07/28 07:10:24 UTC, 1 replies.
- create external table from partitioned avro file - posted by Yang Cao <cy...@gmail.com> on 2016/07/28 07:15:09 UTC, 1 replies.
- Spark 2.0 on YARN - Dynamic Resource Allocation Behavior change? - posted by LONG WANG <wa...@163.com> on 2016/07/28 07:44:18 UTC, 2 replies.
- Materializing mapWithState .stateSnapshot() after ssc.stop - posted by Ben Teeuwen <bt...@gmail.com> on 2016/07/28 08:30:45 UTC, 0 replies.
- Spark Thrift Server (Spark 2.0) show table has value with NULL in all fields - posted by Chanh Le <gi...@gmail.com> on 2016/07/28 09:25:39 UTC, 13 replies.
- SPARK Exception thrown in awaitResult - posted by "Carlo.Allocca" <ca...@open.ac.uk> on 2016/07/28 09:44:59 UTC, 7 replies.
- Is spark-1.6.1-bin-2.6.0 compatible with hive-1.1.0-cdh5.7.1 - posted by Mohammad Tariq <do...@gmail.com> on 2016/07/28 11:45:09 UTC, 3 replies.
- reasons for introducing SPARK-9415 - disable group by on MapType - posted by Tomasz Bartczak <to...@allegrogroup.com> on 2016/07/28 12:05:10 UTC, 0 replies.
- Guys is this some form of Spam or someone has left his auto-reply loose LOL - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/07/28 13:59:06 UTC, 2 replies.
- Pls assist: need to create an udf that returns a LabeledPoint in pyspark - posted by Marco Mistroni <mm...@gmail.com> on 2016/07/28 14:13:15 UTC, 0 replies.
- RDD vs Dataset performance - posted by Darin McBeath <dd...@yahoo.com.INVALID> on 2016/07/28 16:52:47 UTC, 1 replies.
- Re: Unable to create a dataframe from json dstream using pyspark - posted by Sunil Kumar Chinnamgari <su...@yahoo.com.INVALID> on 2016/07/28 17:28:49 UTC, 0 replies.
- Spark 2.0 -- spark warehouse relative path in absolute URI error - posted by Rohit Chaddha <ro...@gmail.com> on 2016/07/28 17:47:08 UTC, 9 replies.
- Custom Image RDD and Sequence Files - posted by jtgenesis <jt...@gmail.com> on 2016/07/28 18:04:57 UTC, 1 replies.
- ClassTag variable in broadcast in spark 2.0 ? how to use - posted by Rohit Chaddha <ro...@gmail.com> on 2016/07/28 18:06:45 UTC, 1 replies.
- Problems initializing SparkUI - posted by Maximiliano Patricio Méndez <mm...@despegar.com> on 2016/07/28 21:37:37 UTC, 2 replies.
- Spark 2.0 Build Failed - posted by Ascot Moss <as...@gmail.com> on 2016/07/28 23:04:38 UTC, 6 replies.
- Re: spark run shell On yarn - posted by Jeff Zhang <zj...@gmail.com> on 2016/07/29 00:13:40 UTC, 2 replies.
- Question / issue while creating a parquet file using a text file with spark 2.0... - posted by Muthu Jayakumar <ba...@gmail.com> on 2016/07/29 04:14:34 UTC, 2 replies.
- how to save spark files as parquets efficiently - posted by Sumit Khanna <su...@askme.in> on 2016/07/29 06:27:48 UTC, 5 replies.
- correct / efficient manner to upsert / update in hdfs (via spark / in general) - posted by Sumit Khanna <su...@askme.in> on 2016/07/29 06:49:50 UTC, 4 replies.
- estimation of necessary time of execution - posted by pseudo oduesp <ps...@gmail.com> on 2016/07/29 12:08:56 UTC, 1 replies.
- sampling operation for DStream - posted by Martin Le <ma...@gmail.com> on 2016/07/29 14:57:25 UTC, 1 replies.
- Tuning level of Parallelism: Increase or decrease? - posted by Jestin Ma <je...@gmail.com> on 2016/07/29 16:02:13 UTC, 0 replies.
- The main difference use case between orderBY and sort - posted by Ashok Kumar <as...@yahoo.com.INVALID> on 2016/07/29 16:20:27 UTC, 2 replies.
- HBase-Spark Module - posted by Benjamin Kim <bb...@gmail.com> on 2016/07/29 17:56:35 UTC, 1 replies.
- multiple SPARK_LOCAL_DIRS causing strange behavior in parallelism - posted by Sa...@wellsfargo.com on 2016/07/29 18:21:18 UTC, 0 replies.
- pyspark 1.6.1 `partitionBy` does not provide meaningful information for `join` to use - posted by Sisyphuss <zh...@gmail.com> on 2016/07/29 18:58:12 UTC, 0 replies.
- Spark 1.6.1 Workaround: Properly handle signal kill of ApplicationMaster - posted by Jatinder Assi <us...@gmail.com> on 2016/07/29 19:12:22 UTC, 1 replies.
- use big files and read from HDFS was: performance problem when reading lots of small files created by spark streaming. - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/07/29 20:04:20 UTC, 1 replies.
- Java Recipes for Spark - posted by Jean Georges Perrin <jg...@jgp.net> on 2016/07/29 20:30:41 UTC, 5 replies.
- sql to spark scala rdd - posted by "Kali.tummala@gmail.com" <Ka...@gmail.com> on 2016/07/30 02:42:12 UTC, 10 replies.
- PySpark 1.6.1: 'builtin_function_or_method' object has no attribute '__code__' in Pickles - posted by Bhaarat Sharma <bh...@gmail.com> on 2016/07/30 05:24:37 UTC, 4 replies.
- Spark 2.0 blocker on windows - spark-warehouse path issue - posted by Tony Lane <to...@gmail.com> on 2016/07/30 09:27:02 UTC, 1 replies.
- Structured Streaming Parquet Sink - posted by Arun Patel <ar...@gmail.com> on 2016/07/30 12:50:34 UTC, 5 replies.
- [SPARK-3586][streaming]Support nested directories in Spark - posted by 接立骞 <ji...@unisound.com> on 2016/07/30 17:26:25 UTC, 0 replies.
- Spark SQL - Invalid method name: 'alter_table_with_cascade' - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2016/07/30 18:15:49 UTC, 0 replies.
- how to order data in descending order in spark dataset - posted by Tony Lane <to...@gmail.com> on 2016/07/30 18:30:10 UTC, 3 replies.
- Visualization of data analysed using spark - posted by Tony Lane <to...@gmail.com> on 2016/07/30 19:45:34 UTC, 3 replies.
- How to filter based on a constant value - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/07/30 20:49:07 UTC, 16 replies.
- Dataframe and corresponding RDD return different rows (PySpark) - posted by parameshr <pa...@gmail.com> on 2016/07/30 21:54:42 UTC, 1 replies.
- How to write contents of RDD to HDFS as separate file for each item in RDD (PySpark) - posted by Bhaarat Sharma <bh...@gmail.com> on 2016/07/30 22:57:54 UTC, 2 replies.
- spark-submit hangs forever after all tasks finish(spark 2.0.0 stable version on yarn) - posted by taozhuo <ta...@gmail.com> on 2016/07/30 23:34:22 UTC, 2 replies.
- multiple spark streaming contexts - posted by Sumit Khanna <su...@askme.in> on 2016/07/31 05:20:02 UTC, 1 replies.
- [Spark 2.0] Why MutableInt cannot be cast to MutableLong? - posted by Chanh Le <gi...@gmail.com> on 2016/07/31 09:12:19 UTC, 1 replies.
- spark 2.0 readStream from a REST API - posted by Ayoub Benali <be...@gmail.com> on 2016/07/31 10:53:03 UTC, 4 replies.
- Spark R 2.0 dapply very slow - posted by Yann-Aël Le Borgne <ya...@gmail.com> on 2016/07/31 13:14:29 UTC, 0 replies.
- spark java - convert string to date - posted by Tony Lane <to...@gmail.com> on 2016/07/31 13:57:36 UTC, 0 replies.
- calling dataset.show on a custom object - displays toString() value as first column and blank for rest - posted by Rohit Chaddha <ro...@gmail.com> on 2016/07/31 14:16:01 UTC, 1 replies.
- Spark Sql - Losing connection with Hive Metastore - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2016/07/31 15:57:17 UTC, 0 replies.
- error while running filter on dataframe - posted by Tony Lane <to...@gmail.com> on 2016/07/31 16:14:51 UTC, 3 replies.
- build error - failing test- Error while building spark 2.0 trunk from github - posted by Rohit Chaddha <ro...@gmail.com> on 2016/07/31 16:54:28 UTC, 1 replies.
- Re: Clean up app folders in worker nodes - posted by pbirsinger <pb...@gmail.com> on 2016/07/31 20:42:15 UTC, 0 replies.