user@spark.apache.org, 2016-01

You are viewing a plain text version of this content. The canonical link for it is here.

- does HashingTF maintain a inverse index? - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/01/01 01:20:57 UTC, 2 replies.
- Re: Spark MLLib KMeans Performance on Amazon EC2 M3.2xlarge - posted by Yanbo Liang <yb...@gmail.com> on 2016/01/01 16:42:02 UTC, 0 replies.
- Re: How to specify the numFeatures in HashingTF - posted by Yanbo Liang <yb...@gmail.com> on 2016/01/01 16:48:41 UTC, 1 replies.
- Re: ERROR server.TThreadPoolServer: Error occurred during processing of message - posted by Dasun Hegoda <da...@gmail.com> on 2016/01/01 17:03:22 UTC, 0 replies.
- Re: NotSerializableException exception while using TypeTag in Scala 2.10 - posted by Yanbo Liang <yb...@gmail.com> on 2016/01/01 17:05:00 UTC, 0 replies.
- Deploying on TOMCAT - posted by rahulganesh <dr...@gmail.com> on 2016/01/01 19:10:09 UTC, 0 replies.
- sqlContext Client cannot authenticate via:[TOKEN, KERBEROS] - posted by philippe L <la...@gmail.com> on 2016/01/01 20:17:34 UTC, 0 replies.
- how to extend java transformer from Scala UnaryTransformer ? - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/01/01 20:38:49 UTC, 1 replies.
- Re: SparkSQL integration issue with AWS S3a - posted by Jerry Lam <ch...@gmail.com> on 2016/01/01 22:35:18 UTC, 3 replies.
- How to find cause(waiting threads etc) of hanging job for 7 hours? - posted by unk1102 <um...@gmail.com> on 2016/01/01 22:56:28 UTC, 7 replies.
- frequent itemsets - posted by Roberto Pagliari <ro...@asos.com> on 2016/01/02 00:51:30 UTC, 5 replies.
- Unable to read JSON input in Spark (YARN Cluster) - posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com on 2016/01/02 02:48:49 UTC, 1 replies.
- Cannot get repartitioning to work - posted by jimitkr <ji...@softpath.net> on 2016/01/02 03:38:06 UTC, 2 replies.
- Re: How to save only values via saveAsHadoopFile or saveAsNewAPIHadoopFile - posted by jimitkr <ji...@softpath.net> on 2016/01/02 03:42:08 UTC, 0 replies.
- Re: [SparkSQL][Parquet] Read from nested parquet data - posted by lin <ku...@gmail.com> on 2016/01/02 07:08:25 UTC, 0 replies.
- How to load partial data from HDFS using Spark SQL - posted by SRK <sw...@gmail.com> on 2016/01/02 07:26:36 UTC, 2 replies.
- Re: Problem embedding GaussianMixtureModel in a closure - posted by Yanbo Liang <yb...@gmail.com> on 2016/01/02 09:45:11 UTC, 2 replies.
- Does state survive application restart in StatefulNetworkWordCount? - posted by Rado Buranský <ra...@gmail.com> on 2016/01/02 15:22:04 UTC, 2 replies.
- feedback on the use of Spark’s gateway hidden REST API (standalone cluster mode) for application submission - posted by HILEM Youcef <yo...@laposte.fr> on 2016/01/02 17:14:21 UTC, 1 replies.
- Can a tempTable registered by sqlContext be used inside a forEachRDD? - posted by SRK <sw...@gmail.com> on 2016/01/03 12:50:53 UTC, 1 replies.
- GLM I'm ml pipeline - posted by Arunkumar Pillai <ar...@gmail.com> on 2016/01/03 15:50:03 UTC, 3 replies.
- subscribe - posted by Rajdeep Dua <ra...@gmail.com> on 2016/01/03 17:52:41 UTC, 4 replies.
- Calculate sum of values in 2nd element of tuple - posted by jimitkr <ji...@softpath.net> on 2016/01/03 21:00:09 UTC, 2 replies.
- Re: translate algorithm in spark - posted by robert_dodier <ro...@gmail.com> on 2016/01/04 00:34:03 UTC, 0 replies.
- Unable to run spark SQL Join query. - posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com on 2016/01/04 04:22:25 UTC, 3 replies.
- sql:Exception in thread "main" scala.MatchError: StringType - posted by Bonsen <he...@126.com> on 2016/01/04 04:26:08 UTC, 1 replies.
- Is Spark 1.6 released? - posted by Jung <jb...@naver.com> on 2016/01/04 10:06:20 UTC, 8 replies.
- Re: pyspark streaming crashes - posted by Antony Mayi <an...@yahoo.com.INVALID> on 2016/01/04 10:40:05 UTC, 0 replies.
- stopping a process usgin an RDD - posted by domibd <db...@lipn.univ-paris13.fr> on 2016/01/04 13:05:56 UTC, 2 replies.
- Trying to run GraphX ConnectedComponents for large data with out success - posted by "Dagan, Arnon" <ar...@ebay.com> on 2016/01/04 13:24:40 UTC, 0 replies.
- [ANNOUNCE] Announcing Spark 1.6.0 - posted by Michael Armbrust <mi...@databricks.com> on 2016/01/04 17:50:00 UTC, 0 replies.
- Re: Spark 1.4 RDD to DF fails with toDF() - posted by Fab <fa...@circleback.com> on 2016/01/04 18:52:54 UTC, 0 replies.
- Spark Job Server with Yarn and Kerberos - posted by Mike Wright <mw...@snl.com> on 2016/01/04 19:22:43 UTC, 1 replies.
- Comparing Subsets of an RDD - posted by Daniel Imberman <da...@gmail.com> on 2016/01/04 20:24:04 UTC, 1 replies.
- HiveThriftServer fails to quote strings - posted by sclyon <sc...@microsoft.com> on 2016/01/04 20:36:13 UTC, 2 replies.
- Spark Streaming Application is Stuck Under Heavy Load Due to DeadLock - posted by Rachana Srivastava <Ra...@markmonitor.com> on 2016/01/04 21:56:35 UTC, 0 replies.
- RE: SparkML algos limitations question. - posted by "Ulanov, Alexander" <al...@hpe.com> on 2016/01/04 22:06:18 UTC, 1 replies.
- Re: Spark Streaming Application is Stuck Under Heavy Load Due to DeadLock - posted by Shixiong Zhu <zs...@gmail.com> on 2016/01/04 22:10:03 UTC, 0 replies.
- Re: Batch together RDDs for Streaming output, without delaying execution of map or transform functions - posted by Tathagata Das <td...@databricks.com> on 2016/01/04 22:55:57 UTC, 0 replies.
- Re: email not showing up on the mailing list - posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov> on 2016/01/04 22:58:00 UTC, 0 replies.
- Monitor Job on Yarn - posted by Daniel Valdivia <ho...@danielvaldivia.com> on 2016/01/04 23:49:30 UTC, 3 replies.
- groupByKey does not work? - posted by Arun Luthra <ar...@gmail.com> on 2016/01/05 00:55:20 UTC, 7 replies.
- copy/mv hdfs file to another directory by spark program - posted by Zhiliang Zhu <zc...@yahoo.com.INVALID> on 2016/01/05 04:07:43 UTC, 2 replies.
- problem with DataFrame df.withColumn() org.apache.spark.sql.AnalysisException: resolved attribute(s) missing - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/01/05 05:08:20 UTC, 8 replies.
- Negative Number of Active Tasks in Spark UI - posted by Prasad Ravilla <pr...@slalom.com> on 2016/01/05 07:05:15 UTC, 3 replies.
- unsubscribe - posted by Irvin <r....@foxmail.com> on 2016/01/05 07:35:05 UTC, 0 replies.
- [discuss] dropping Python 2.6 support - posted by Reynold Xin <rx...@databricks.com> on 2016/01/05 08:17:07 UTC, 25 replies.
- Security authentication interface for Spark - posted by jiehua <bj...@cn.ibm.com> on 2016/01/05 08:53:26 UTC, 0 replies.
- finding distinct count using dataframe - posted by Arunkumar Pillai <ar...@gmail.com> on 2016/01/05 10:11:17 UTC, 3 replies.
- Can spark.scheduler.pool be applied globally ? - posted by Jeff Zhang <zj...@gmail.com> on 2016/01/05 10:57:22 UTC, 6 replies.
- Is there a way to use parallelize function in sparkR spark version (1.6.0) - posted by Chandan Verma <ch...@citiustech.com> on 2016/01/05 11:36:47 UTC, 1 replies.
- RE: Spark Streaming + Kafka + scala job message read issue - posted by vi...@wipro.com on 2016/01/05 12:08:10 UTC, 1 replies.
- sparkR ORC support. - posted by Sandeep Khurana <sa...@infoworks.io> on 2016/01/05 12:27:59 UTC, 17 replies.
- pyspark Dataframe and histogram through ggplot (python) - posted by Snehotosh Banerjee <sn...@gmail.com> on 2016/01/05 13:32:17 UTC, 1 replies.
- Re: java.io.FileNotFoundException(Too many open files) in Spark streaming - posted by Priya Ch <le...@gmail.com> on 2016/01/05 14:03:36 UTC, 4 replies.
- Networking problems in Spark 1.6.0 - posted by Yiannis Gkoufas <jo...@gmail.com> on 2016/01/05 15:29:44 UTC, 4 replies.
- Handling futures from foreachPartitionAsync in Spark Streaming - posted by Trevor <tr...@ave81.com> on 2016/01/05 15:48:57 UTC, 0 replies.
- Double Counting When Using Accumulators with Spark Streaming - posted by Rachana Srivastava <Ra...@markmonitor.com> on 2016/01/05 17:14:50 UTC, 3 replies.
- Spark 1.6 - Datasets and Avro Encoders - posted by Olivier Girardot <o....@lateral-thoughts.com> on 2016/01/05 18:45:53 UTC, 3 replies.
- Spark on Apache Ingnite? - posted by unk1102 <um...@gmail.com> on 2016/01/05 19:14:52 UTC, 6 replies.
- coalesce(1).saveAsTextfile() takes forever? - posted by unk1102 <um...@gmail.com> on 2016/01/05 20:58:04 UTC, 4 replies.
- Spark SQL dataframes explode /lateral view help - posted by Deenar Toraskar <de...@gmail.com> on 2016/01/05 21:00:16 UTC, 1 replies.
- spark-itemsimilarity No FileSystem for scheme error - posted by roy <rp...@njit.edu> on 2016/01/05 21:22:17 UTC, 0 replies.
- aggregateByKey vs combineByKey - posted by Marco Mistroni <mm...@gmail.com> on 2016/01/05 22:13:40 UTC, 3 replies.
- sortBy transformation shows as a job - posted by Soumitra Kumar <ku...@gmail.com> on 2016/01/05 22:43:09 UTC, 0 replies.
- How to concat few rows into a new column in dataframe - posted by Gavin Yue <yu...@gmail.com> on 2016/01/05 23:46:53 UTC, 5 replies.
- problem building spark on centos - posted by Jade Liu <ja...@nor1.com> on 2016/01/06 01:54:41 UTC, 9 replies.
- pyspark dataframe: row with a minimum value of a column for each group - posted by Wei Chen <we...@gmail.com> on 2016/01/06 01:56:31 UTC, 3 replies.
- DataFrame withColumnRenamed throwing NullPointerException - posted by Prasad Ravilla <pr...@slalom.com> on 2016/01/06 02:10:01 UTC, 0 replies.
- UpdateStateByKey : Partitioning and Shuffle - posted by Soumitra Johri <so...@gmail.com> on 2016/01/06 02:21:23 UTC, 1 replies.
- How to use Java8 - posted by Sea <26...@qq.com> on 2016/01/06 03:16:27 UTC, 1 replies.
- Re: 101 question on external metastore - posted by Yana Kadiyska <ya...@gmail.com> on 2016/01/06 03:55:21 UTC, 2 replies.
- Out of memory issue - posted by babloo80 <ba...@gmail.com> on 2016/01/06 04:44:11 UTC, 2 replies.
- [Spark-SQL] Custom aggregate function for GrouppedData - posted by Abhishek Gayakwad <a....@gmail.com> on 2016/01/06 05:14:23 UTC, 2 replies.
- 回复： How to use Java8 - posted by Sea <26...@qq.com> on 2016/01/06 07:43:42 UTC, 0 replies.
- How to accelerate reading json file? - posted by Gavin Yue <yu...@gmail.com> on 2016/01/06 08:13:30 UTC, 3 replies.
- How to insert df in HBASE - posted by Sadaf <sa...@platalytics.com> on 2016/01/06 12:07:01 UTC, 1 replies.
- Spark DataFrame limit question - posted by Arkadiusz Bicz <ar...@gmail.com> on 2016/01/06 13:08:17 UTC, 0 replies.
- Re: [Spark on YARN] Multiple Auxiliary Shuffle Service Versions - posted by Deenar Toraskar <de...@gmail.com> on 2016/01/06 14:43:41 UTC, 0 replies.
- spark 1.6 Issue - posted by "Kali.tummala@gmail.com" <Ka...@gmail.com> on 2016/01/06 15:35:52 UTC, 3 replies.
- Spark Streaming: process only last events - posted by Julien Naour <ju...@gmail.com> on 2016/01/06 15:52:36 UTC, 6 replies.
- When to use streaming state and when an external storage? - posted by Rado Buranský <ra...@gmail.com> on 2016/01/06 17:28:16 UTC, 0 replies.
- fp growth - clean up repetitions in input - posted by matd <ma...@gmail.com> on 2016/01/06 19:28:55 UTC, 0 replies.
- What should be the ideal value(unit) for spark.memory.offheap.size - posted by unk1102 <um...@gmail.com> on 2016/01/06 19:29:19 UTC, 0 replies.
- Why is this job running since one hour? - posted by unk1102 <um...@gmail.com> on 2016/01/06 19:33:12 UTC, 2 replies.
- Predictive Modelling in sparkR - posted by Chandan Verma <ch...@citiustech.com> on 2016/01/06 19:45:49 UTC, 2 replies.
- Spark Token Expired Exception - posted by Nikhil Gs <gs...@gmail.com> on 2016/01/06 21:16:40 UTC, 2 replies.
- Re: error writing to stdout - posted by Bryan Cutler <cu...@gmail.com> on 2016/01/06 21:17:25 UTC, 0 replies.
- Timeout connecting between workers after upgrade to 1.6 - posted by Jeff Jones <jj...@adaptivebiotech.com> on 2016/01/06 22:57:39 UTC, 3 replies.
- Re: What should be the ideal value(unit) for spark.memory.offheap.size - posted by Jakob Odersky <jo...@gmail.com> on 2016/01/06 23:35:01 UTC, 4 replies.
- Problems with too many checkpoint files with Spark Streaming - posted by Jan Algermissen <al...@icloud.com> on 2016/01/07 00:13:29 UTC, 3 replies.
- Date and Time as a Feature - posted by Jorge Machado <jo...@hotmail.com> on 2016/01/07 00:47:43 UTC, 0 replies.
- connecting beeline to spark sql thrift server - posted by Sunil Kumar <pa...@yahoo.com.INVALID> on 2016/01/07 01:32:44 UTC, 0 replies.
- spark dataframe read large mysql table running super slow - posted by "fightfate@163.com" <fi...@163.com> on 2016/01/07 03:47:13 UTC, 0 replies.
- org.apache.spark.storage.BlockNotFoundException in Spark1.5.2+Tachyon0.7.1 - posted by Jia Zou <ja...@gmail.com> on 2016/01/07 05:41:14 UTC, 2 replies.
- Need Help in Spark Hive Data Processing - posted by "Balaraju.Kagidala Kagidala" <ba...@gmail.com> on 2016/01/07 05:47:18 UTC, 2 replies.
- LogisticsRegression in ML pipeline help page - posted by Arunkumar Pillai <ar...@gmail.com> on 2016/01/07 05:53:41 UTC, 1 replies.
- Any way to accelerate the Data Frame repartition? - posted by Gavin Yue <yu...@gmail.com> on 2016/01/07 07:14:17 UTC, 0 replies.
- Update Hive tables from Spark without loading entire table in to a dataframe - posted by sudhir <su...@gmail.com> on 2016/01/07 08:10:16 UTC, 2 replies.
- [Spark 1.6] Spark Streaming - java.lang.AbstractMethodError - posted by Walid LEZZAR <wa...@gmail.com> on 2016/01/07 10:59:43 UTC, 9 replies.
- Window Functions importing issue in Spark 1.4.0 - posted by satish chandra j <js...@gmail.com> on 2016/01/07 12:04:36 UTC, 5 replies.
- How HiveContext can read subdirectories - posted by Arkadiusz Bicz <ar...@gmail.com> on 2016/01/07 12:28:06 UTC, 0 replies.
- Error starting solr with plugin which uses spark libraries - posted by Rahul Kumar <ra...@snapdeal.com> on 2016/01/07 14:40:30 UTC, 0 replies.
- How to split a huge rdd and broadcast it by turns? - posted by Demon King <kd...@gmail.com> on 2016/01/07 15:01:27 UTC, 1 replies.
- Re: Can't submit job to stand alone cluster - posted by Greg Hill <gr...@RACKSPACE.COM> on 2016/01/07 15:35:29 UTC, 0 replies.
- spark ui security - posted by Kostiantyn Kudriavtsev <ku...@gmail.com> on 2016/01/07 15:35:48 UTC, 6 replies.
- Spark shell throws java.lang.RuntimeException - posted by will <wi...@chronopost.fr> on 2016/01/07 15:49:49 UTC, 0 replies.
- How to get the driver id programmatically? - posted by Greg Hill <gr...@RACKSPACE.COM> on 2016/01/07 16:34:30 UTC, 0 replies.
- How to load specific Hive partition in DataFrame Spark 1.6? - posted by unk1102 <um...@gmail.com> on 2016/01/07 16:34:41 UTC, 3 replies.
- Question in rdd caching in memory using persist - posted by se...@nomura.com on 2016/01/07 17:51:25 UTC, 3 replies.
- Spark streaming routing - posted by Lin Zhao <li...@exabeam.com> on 2016/01/07 18:34:05 UTC, 2 replies.
- Problems with reading data from parquet files in a HDFS remotely - posted by Henrik Baastrup <he...@netscout.com> on 2016/01/07 18:53:34 UTC, 4 replies.
- Date Time Regression as Feature - posted by Jorge Machado <jo...@hotmail.com> on 2016/01/07 20:09:13 UTC, 1 replies.
- Re: Date Time Regression as Feature - posted by Sujit Pal <su...@gmail.com> on 2016/01/07 20:25:05 UTC, 3 replies.
- "impossible to get artifacts " error when using sbt to build 1.6.0 for scala 2.11 - posted by Lin Zhao <li...@exabeam.com> on 2016/01/07 20:26:57 UTC, 2 replies.
- adding jars - hive on spark cdh 5.4.3 - posted by Ophir Etzion <op...@foursquare.com> on 2016/01/07 22:03:03 UTC, 6 replies.
- Spark job uses only one Worker - posted by Michael Pisula <mi...@tngtech.com> on 2016/01/07 22:24:32 UTC, 10 replies.
- SparkContext SyntaxError: invalid syntax - posted by weineran <an...@u.northwestern.edu> on 2016/01/07 23:39:13 UTC, 16 replies.
- Re: Large scale ranked recommendation - posted by xenocyon <ap...@gmail.com> on 2016/01/08 02:56:52 UTC, 0 replies.
- Newbie question - posted by yuliya Feldman <yu...@yahoo.com.INVALID> on 2016/01/08 07:36:44 UTC, 6 replies.
- Recommendations using Spark - posted by anjali gautam <an...@gmail.com> on 2016/01/08 08:11:42 UTC, 4 replies.
- Spark Context not getting initialized in local mode - posted by Rahul Kumar <ra...@snapdeal.com> on 2016/01/08 08:24:22 UTC, 1 replies.
- Re: "Spark-events does not exist" error, while it does with all the req. rights - posted by Robineast <Ro...@xense.co.uk> on 2016/01/08 10:18:04 UTC, 0 replies.
- Unable to compile from source - posted by Gaini Rajeshwar <ra...@gmail.com> on 2016/01/08 12:23:28 UTC, 2 replies.
- how deploy pmml model in spark - posted by Sangameshwar Swami <sa...@hcl.com> on 2016/01/08 13:36:19 UTC, 0 replies.
- write new data to mysql - posted by Yasemin Kaya <go...@gmail.com> on 2016/01/08 16:36:17 UTC, 6 replies.
- Re: Kryo serializer Exception during serialization: java.io.IOException: java.lang.IllegalArgumentException: - posted by jiml <ji...@megalearningllc.com> on 2016/01/08 17:52:00 UTC, 0 replies.
- Efficient join multiple times - posted by Jason White <ja...@shopify.com> on 2016/01/08 17:56:06 UTC, 0 replies.
- Re: Kryo serializer Exception during serialization: java.io.IOException: java.lang.IllegalArgumentException: - posted by Ted Yu <yu...@gmail.com> on 2016/01/08 17:58:35 UTC, 1 replies.
- Create a n x n graph given only the vertices - posted by praveen S <my...@gmail.com> on 2016/01/08 19:27:38 UTC, 0 replies.
- Do we need to enabled Tungsten sort in Spark 1.6? - posted by unk1102 <um...@gmail.com> on 2016/01/08 21:21:13 UTC, 4 replies.
- How to merge two large table and remove duplicates? - posted by Gavin Yue <yu...@gmail.com> on 2016/01/08 23:04:34 UTC, 18 replies.
- Standalone Scala Project 'sbt package erroring out" - posted by srkanth devineni <ds...@gmail.com> on 2016/01/08 23:46:43 UTC, 0 replies.
- How to compile Python and use How to compile Python and use spark-submit - posted by Ascot Moss <as...@gmail.com> on 2016/01/09 03:44:47 UTC, 1 replies.
- how garbage collection works on parallelize - posted by jluan <ja...@gmail.com> on 2016/01/09 03:50:28 UTC, 1 replies.
- pyspark: conditionals inside functions - posted by Franc Carter <fr...@gmail.com> on 2016/01/09 04:45:15 UTC, 3 replies.
- (Unknown) - posted by Suresh Thalamati <su...@gmail.com> on 2016/01/09 06:06:12 UTC, 1 replies.
- Re: Benchmarking with multiple users in Spark - posted by Chris Fregly <ch...@fregly.com> on 2016/01/09 08:53:39 UTC, 0 replies.
- broadcast params to workers at the very beginning - posted by "octavian.ganea" <oc...@inf.ethz.ch> on 2016/01/09 16:12:04 UTC, 1 replies.
- Best IDE Configuration - posted by Jorge Machado <jo...@me.com> on 2016/01/09 20:16:52 UTC, 2 replies.
- spark access old version of Hadoop 2.1.0 and Hive version 0.11 - posted by Jade Liu <ja...@nor1.com> on 2016/01/09 20:26:39 UTC, 0 replies.
- pyspark: calculating row deltas - posted by Franc Carter <fr...@gmail.com> on 2016/01/09 22:55:15 UTC, 4 replies.
- StandardScaler in spark.ml.feature requires vector input? - posted by Kristina Rogale Plazonic <kp...@gmail.com> on 2016/01/10 01:10:57 UTC, 1 replies.
- java.lang.NoClassDefFoundError even when use sc.addJar - posted by rayqiu <ra...@gmail.com> on 2016/01/10 01:12:51 UTC, 0 replies.
- Too many tasks killed the scheduler - posted by Gavin Yue <yu...@gmail.com> on 2016/01/10 10:51:13 UTC, 4 replies.
- Negative Number of Workers used memory in Spark UI - posted by Ricky <49...@qq.com> on 2016/01/11 02:59:54 UTC, 0 replies.
- Re: Create a n x n graph given only the vertices no - posted by praveen S <my...@gmail.com> on 2016/01/11 04:19:24 UTC, 4 replies.
- Spark 1.6 udf/udaf alternatives in dataset? - posted by Muthu Jayakumar <ba...@gmail.com> on 2016/01/11 06:37:10 UTC, 7 replies.
- pre-install 3-party Python package on spark cluster - posted by "taotao.li" <ch...@gmail.com> on 2016/01/11 06:50:35 UTC, 3 replies.
- parquet repartitions and parquet.enable.summary-metadata does not work - posted by Gavin Yue <yu...@gmail.com> on 2016/01/11 07:12:52 UTC, 3 replies.
- GroupBy on DataFrame taking too much time - posted by Gaini Rajeshwar <ra...@gmail.com> on 2016/01/11 08:43:03 UTC, 3 replies.
- Getting an error while submitting spark jar - posted by Sree Eedupuganti <sr...@inndata.in> on 2016/01/11 08:49:15 UTC, 1 replies.
- Getting kafka offsets at beginning of spark streaming application - posted by Abhishek Anand <ab...@gmail.com> on 2016/01/11 10:09:40 UTC, 3 replies.
- XML column not supported in Database - posted by Gaini Rajeshwar <ra...@gmail.com> on 2016/01/11 10:44:28 UTC, 2 replies.
- Logger overridden when using JavaSparkContext - posted by Max Schmidt <ma...@datapath.io> on 2016/01/11 10:56:41 UTC, 2 replies.
- Read from AWS s3 with out having to hard-code sensitive keys - posted by Krishna Rao <kr...@gmail.com> on 2016/01/11 12:46:04 UTC, 4 replies.
- Spark integration with HCatalog (specifically regarding partitions) - posted by Elliot West <te...@gmail.com> on 2016/01/11 13:36:53 UTC, 2 replies.
- Deploying model built in SparkR - posted by Chandan Verma <ch...@citiustech.com> on 2016/01/11 13:40:05 UTC, 1 replies.
- Trying to understand dynamic resource allocation - posted by Yiannis Gkoufas <jo...@gmail.com> on 2016/01/11 14:10:22 UTC, 2 replies.
- Manipulate Twitter Stream Filter on runtime - posted by Filli Alem <Al...@ti8m.ch> on 2016/01/11 14:13:01 UTC, 1 replies.
- Long running jobs in CDH - posted by Jan Holmberg <ja...@perigeum.fi> on 2016/01/11 14:23:09 UTC, 1 replies.
- Spark SQL "partition stride"? - posted by Keith Freeman <8f...@gmail.com> on 2016/01/11 17:30:12 UTC, 0 replies.
- Design query regarding dataframe usecase - posted by Kapil Malik <ka...@snapdeal.com> on 2016/01/11 17:53:39 UTC, 0 replies.
- Re: partitioning RDD - posted by Ted Yu <yu...@gmail.com> on 2016/01/11 20:11:38 UTC, 1 replies.
- Best practices for sharing/maintaining large resource files for Spark jobs - posted by Dmitry Goldenberg <dg...@gmail.com> on 2016/01/11 21:14:06 UTC, 12 replies.
- Windows driver cannot run job on Linux cluster - posted by Andrew Wooster <an...@gmail.com> on 2016/01/11 22:10:44 UTC, 2 replies.
- [KafkaRDD]: rdd.cache() does not seem to work - posted by ponkin <al...@ya.ru> on 2016/01/11 22:13:34 UTC, 3 replies.
- Regarding sliding window example from Databricks for DStream - posted by Cassa L <lc...@gmail.com> on 2016/01/12 00:09:06 UTC, 1 replies.
- Put all elements of RDD into array - posted by Daniel Valdivia <ho...@danielvaldivia.com> on 2016/01/12 01:55:04 UTC, 3 replies.
- ibsnappyjava.so: failed to map segment from shared object - posted by yatinla <mi...@yatinla.com> on 2016/01/12 04:12:18 UTC, 2 replies.
- Re: how to submit multiple jar files when using spark-submit script in shell? - posted by jiml <ji...@megalearningllc.com> on 2016/01/12 07:34:33 UTC, 2 replies.
- Various ways to use --jars? Some undocumented ways? - posted by jiml <ji...@megalearningllc.com> on 2016/01/12 07:40:27 UTC, 0 replies.
- Unshaded google guava classes in spark-network-common jar - posted by Jake Yoon <su...@gmail.com> on 2016/01/12 08:28:24 UTC, 1 replies.
- model deployment in spark - posted by Chandan Verma <ch...@citiustech.com> on 2016/01/12 09:14:38 UTC, 0 replies.
- PCA OutOfMemoryError - posted by Bharath Ravi Kumar <re...@gmail.com> on 2016/01/12 09:36:45 UTC, 3 replies.
- Does spark restart the executors if its nodemanager crashes? - posted by Bing Jiang <ji...@gmail.com> on 2016/01/12 09:37:14 UTC, 0 replies.
- Re: JMXSink for YARN deployment - posted by Kyle Lin <ky...@gmail.com> on 2016/01/12 09:54:06 UTC, 1 replies.
- Using lz4 in Kafka seems to be broken by jpountz dependency upgrade in Spark 1.5.x+ - posted by Stefan Schadwinkel <st...@smaato.com> on 2016/01/12 10:30:01 UTC, 0 replies.
- Lost tasks due to OutOfMemoryError (GC overhead limit exceeded) - posted by Barak Yaish <ba...@gmail.com> on 2016/01/12 11:04:29 UTC, 1 replies.
- Maintain a state till the end of the application - posted by "turing.us" <tu...@gmail.com> on 2016/01/12 11:16:31 UTC, 0 replies.
- Use TCP client for id lookup - posted by Kristoffer Sjögren <st...@gmail.com> on 2016/01/12 11:35:52 UTC, 0 replies.
- Top K Parallel FPGrowth Implementation - posted by jcbarton <jo...@purenet.co.uk> on 2016/01/12 11:43:55 UTC, 0 replies.
- Job History Logs for spark jobs submitted on YARN - posted by laxmanvemula <la...@gmail.com> on 2016/01/12 11:50:49 UTC, 2 replies.
- Re: Mllib Word2Vec vector representations are very high in value - posted by Nick Pentreath <ni...@gmail.com> on 2016/01/12 14:33:49 UTC, 0 replies.
- ROSE: Spark + R on the JVM. - posted by David <th...@protonmail.com> on 2016/01/12 16:03:36 UTC, 9 replies.
- ROSE: Spark + R on the JVM, now available. - posted by David Russell <th...@protonmail.com> on 2016/01/12 17:26:34 UTC, 0 replies.
- How to optimiz and make this code faster using coalesce(1) and mapPartitionIndex - posted by unk1102 <um...@gmail.com> on 2016/01/12 18:28:15 UTC, 1 replies.
- Big data job only finishes with Legacy memory management - posted by Sa...@wellsfargo.com on 2016/01/12 19:15:16 UTC, 0 replies.
- Eigenvalue solver - posted by Lydia Ickler <ic...@googlemail.com> on 2016/01/12 19:28:02 UTC, 0 replies.
- Re: Enabling mapreduce.input.fileinputformat.list-status.num-threads in Spark? - posted by Alex Nastetsky <al...@vervemobile.com> on 2016/01/12 19:55:58 UTC, 2 replies.
- How to view the RDD data based on Partition - posted by Gokula Krishnan D <em...@gmail.com> on 2016/01/12 20:06:18 UTC, 3 replies.
- How to change the no of cores assigned for a Submitted Job - posted by Ashish Soni <as...@gmail.com> on 2016/01/12 21:27:30 UTC, 0 replies.
- [Spark SQL]: Issues with writing dataframe with Append Mode to Parquet - posted by Jerry Lam <ch...@gmail.com> on 2016/01/12 22:11:13 UTC, 2 replies.
- rdd join very slow when rdd created from data frame - posted by Koert Kuipers <ko...@tresata.com> on 2016/01/12 23:16:58 UTC, 2 replies.
- Read HDFS file from an executor(closure) - posted by Udit Mehta <um...@groupon.com.INVALID> on 2016/01/13 01:17:10 UTC, 0 replies.
- FPGrowth does not handle large result sets - posted by Ritu Raj Tiwari <ri...@yahoo.com.INVALID> on 2016/01/13 01:43:45 UTC, 6 replies.
- 1.6.0: Standalone application: Getting ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory - posted by Egor Pahomov <pa...@gmail.com> on 2016/01/13 02:01:12 UTC, 1 replies.
- failure to parallelize an RDD - posted by AlexG <sw...@gmail.com> on 2016/01/13 03:37:36 UTC, 2 replies.
- [Spark Streaming] "Could not compute split, block input-0-1452563923800 not found” when trying to recover from checkpoint data - posted by Collin Shi <sh...@aliyun.com> on 2016/01/13 03:38:43 UTC, 0 replies.
- RE: Dedup - posted by gpmacalalad <gp...@talas.ph> on 2016/01/13 03:43:25 UTC, 0 replies.
- ml.classification.NaiveBayesModel how to reshape theta - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/01/13 03:47:25 UTC, 0 replies.
- Re: - posted by Sabarish Sasidharan <sa...@manthan.com> on 2016/01/13 04:01:00 UTC, 0 replies.
- hiveContext.sql() - Join query fails silently - posted by Jins George <ji...@aeris.net> on 2016/01/13 06:10:28 UTC, 0 replies.
- spark job failure - akka error Association with remote system has failed - posted by vi...@wipro.com on 2016/01/13 07:48:12 UTC, 4 replies.
- Serializing DataSets - posted by Simon Hafner <re...@gmail.com> on 2016/01/13 08:20:12 UTC, 4 replies.
- Spark ignores SPARK_WORKER_MEMORY? - posted by Barak Yaish <ba...@gmail.com> on 2016/01/13 09:59:13 UTC, 1 replies.
- Kafka Streaming and partitioning - posted by ddav <da...@gmail.com> on 2016/01/13 10:40:36 UTC, 6 replies.
- Co-Partitioned Joins - posted by ddav <da...@gmail.com> on 2016/01/13 10:57:53 UTC, 0 replies.
- Concurrent Read of Accumulator's Value - posted by Kira <me...@gmail.com> on 2016/01/13 11:20:23 UTC, 1 replies.
- Merging compatible schemas on Spark 1.6.0 - posted by emlyn <em...@swiftkey.com> on 2016/01/13 12:06:46 UTC, 0 replies.
- Is it possible to use SparkSQL JDBC ThriftServer without Hive - posted by "angela.whelan" <an...@synchronoss.com> on 2016/01/13 12:37:08 UTC, 4 replies.
- Spark Cassandra Java Connector: records missing despite consistency=ALL - posted by Dennis Birkholz <bi...@pubgrade.com> on 2016/01/13 13:17:59 UTC, 2 replies.
- Error in Spark Executors when trying to read HBase table from Spark with Kerberos enabled - posted by Vinay Kashyap <vi...@gmail.com> on 2016/01/13 14:09:40 UTC, 1 replies.
- Error connecting to temporary derby metastore used by Spark, when running multiple jobs on the same SparkContext - posted by Deenar Toraskar <de...@gmail.com> on 2016/01/13 14:28:26 UTC, 0 replies.
- Read Accumulator value while running - posted by Kira <me...@gmail.com> on 2016/01/13 14:43:34 UTC, 3 replies.
- Spark Thrift Server 2 problem - posted by Бобров Виктор <ma...@bk.ru> on 2016/01/13 15:11:48 UTC, 0 replies.
- Re: ml.classification.NaiveBayesModel how to reshape theta - posted by Yanbo Liang <yb...@gmail.com> on 2016/01/13 15:29:09 UTC, 1 replies.
- Spark 1.6 and Application History not working correctly - posted by Darin McBeath <dd...@yahoo.com.INVALID> on 2016/01/13 15:29:34 UTC, 3 replies.
- How to make Dataset api as fast as DataFrame - posted by Arkadiusz Bicz <ar...@gmail.com> on 2016/01/13 15:39:59 UTC, 2 replies.
- distributeBy using advantage of HDFS or RDD partitioning - posted by Deenar Toraskar <de...@gmail.com> on 2016/01/13 16:09:46 UTC, 1 replies.
- Need 'Learning Spark' Partner - posted by King sami <kg...@gmail.com> on 2016/01/13 16:20:11 UTC, 0 replies.
- How to get the working directory in executor - posted by Byron Wang <op...@gmail.com> on 2016/01/13 17:01:56 UTC, 2 replies.
- Running window functions in spark dataframe - posted by rakesh sharma <ra...@hotmail.com> on 2016/01/13 18:05:43 UTC, 0 replies.
- Re: Spark SQL UDF with Struct input parameters - posted by Deenar Toraskar <de...@gmail.com> on 2016/01/13 18:22:26 UTC, 0 replies.
- yarn-client: SparkSubmitDriverBootstrapper not found in yarn client mode (1.6.0) - posted by Lin Zhao <li...@exabeam.com> on 2016/01/13 18:31:40 UTC, 5 replies.
- Optimized way to multiply two large matrices and save output using Spark and Scala - posted by "Devi P.V" <de...@gmail.com> on 2016/01/13 19:16:07 UTC, 1 replies.
- Sending large objects to specific RDDs - posted by Daniel Imberman <da...@gmail.com> on 2016/01/13 20:29:49 UTC, 9 replies.
- automatically unpersist RDDs which are not used for 24 hours? - posted by Alexander Pivovarov <ap...@gmail.com> on 2016/01/13 20:36:38 UTC, 1 replies.
- Best practice for retrieving over 1 million files from S3 - posted by Darin McBeath <dd...@yahoo.com.INVALID> on 2016/01/13 20:42:38 UTC, 3 replies.
- SQL UDF problem (with re to types) - posted by raghukiran <ra...@gmail.com> on 2016/01/13 20:58:30 UTC, 7 replies.
- Re: Exception in Spark-sql insertIntoJDBC command - posted by RichG <ri...@riseinteractive.com> on 2016/01/13 22:48:02 UTC, 0 replies.
- trouble calculating TF-IDF data type mismatch: '(tf * idf)' requires numeric type, not vector; - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/01/13 23:52:30 UTC, 1 replies.
- Random Forest FeatureImportance throwing NullPointerException - posted by Rachana Srivastava <Ra...@markmonitor.com> on 2016/01/14 00:29:33 UTC, 3 replies.
- Hive is unable to avro file written by spark avro - posted by Siva <sb...@gmail.com> on 2016/01/14 02:20:15 UTC, 1 replies.
- spark streaming context trigger invoke stop why? - posted by "Triones,Deng(vip.com)" <tr...@vipshop.com> on 2016/01/14 04:24:12 UTC, 0 replies.
- 答复: spark streaming context trigger invoke stop why? - posted by "Triones,Deng(vip.com)" <tr...@vipshop.com> on 2016/01/14 04:25:34 UTC, 0 replies.
- Re: 答复: spark streaming context trigger invoke stop why? - posted by Yogesh Mahajan <ym...@snappydata.io> on 2016/01/14 05:26:13 UTC, 1 replies.
- Spark on YARN job continuously reports "Application does not exist in cache" - posted by Prabhu Joseph <pr...@gmail.com> on 2016/01/14 06:01:03 UTC, 0 replies.
- [discuss] dropping Hadoop 2.2 and 2.3 support in Spark 2.0? - posted by Reynold Xin <rx...@databricks.com> on 2016/01/14 07:29:09 UTC, 1 replies.
- Usage of SparkContext within a Web container - posted by praveen S <my...@gmail.com> on 2016/01/14 08:44:27 UTC, 1 replies.
- 答复: 答复: spark streaming context trigger invoke stop why? - posted by "Triones,Deng(vip.com)" <tr...@vipshop.com> on 2016/01/14 08:45:02 UTC, 2 replies.
- NPE when using Joda DateTime - posted by "Spencer, Alex (Santander)" <Al...@santander.co.uk.INVALID> on 2016/01/14 14:01:04 UTC, 12 replies.
- Spark and HBase RDD join/get - posted by Kristoffer Sjögren <st...@gmail.com> on 2016/01/14 14:04:00 UTC, 2 replies.
- Spark SQL . How to enlarge output rows ? - posted by Eli Super <el...@gmail.com> on 2016/01/14 14:09:20 UTC, 8 replies.
- code hangs in local master mode - posted by Kai Wei <we...@pythian.com> on 2016/01/14 15:01:34 UTC, 2 replies.
- Spark 1.5.2 streaming driver in YARN cluster mode on Hadoop 2.6 (on EMR 4.2) restarts after stop - posted by Roberto Coluccio <ro...@gmail.com> on 2016/01/14 17:57:44 UTC, 0 replies.
- strange behavior in spark yarn-client mode - posted by Sanjeev Verma <sa...@gmail.com> on 2016/01/14 19:17:51 UTC, 2 replies.
- Can we use localIterator when we need to process data in one partition? - posted by unk1102 <um...@gmail.com> on 2016/01/14 19:41:10 UTC, 0 replies.
- DataFrameWriter on partitionBy for parquet eat all RAM - posted by Arkadiusz Bicz <ar...@gmail.com> on 2016/01/14 20:31:57 UTC, 3 replies.
- Set Hadoop User in Spark Shell - posted by Daniel Valdivia <ho...@danielvaldivia.com> on 2016/01/14 21:32:57 UTC, 0 replies.
- How to bind webui to localhost? - posted by Zee Chen <ze...@gmail.com> on 2016/01/14 23:51:11 UTC, 2 replies.
- Spark Streaming: custom actor receiver losing vast majority of data - posted by Lin Zhao <li...@exabeam.com> on 2016/01/15 01:06:16 UTC, 4 replies.
- Using JDBC clients with "Spark on Hive" - posted by sdevashis <sd...@gmail.com> on 2016/01/15 02:15:23 UTC, 2 replies.
- Undestanding Spark Rebalancing - posted by Pedro Rodriguez <sk...@gmail.com> on 2016/01/15 02:40:22 UTC, 0 replies.
- 答复: 答复: 答复: spark streaming context trigger invoke stop why? - posted by "Triones,Deng(vip.com)" <tr...@vipshop.com> on 2016/01/15 07:20:13 UTC, 1 replies.
- livy test problem: Failed to execute goal org.scalatest:scalatest-maven-plugin:1.0:test (test) on project livy-spark_2.10: There are test failures - posted by Ruslan Dautkhanov <da...@gmail.com> on 2016/01/15 07:55:40 UTC, 0 replies.
- DataFrame partitionBy to a single Parquet file (per partition) - posted by Patrick McGloin <mc...@gmail.com> on 2016/01/15 08:48:07 UTC, 3 replies.
- AIC in Linear Regression in ml pipeline - posted by Arunkumar Pillai <ar...@gmail.com> on 2016/01/15 10:20:30 UTC, 1 replies.
- [Spark 1.6][Streaming] About the behavior of mapWithState - posted by Terry Hoo <hu...@gmail.com> on 2016/01/15 10:21:24 UTC, 2 replies.
- spark source Intellij - posted by Sanjeev Verma <sa...@gmail.com> on 2016/01/15 11:19:17 UTC, 1 replies.
- sqlContext.cacheTable("tableName") vs dataFrame.cache() - posted by George Sigletos <si...@textkernel.nl> on 2016/01/15 14:00:42 UTC, 3 replies.
- simultaneous actions - posted by Kira <me...@gmail.com> on 2016/01/15 14:52:02 UTC, 17 replies.
- Spark App -Yarn-Cluster-Mode ===> Hadoop_conf_**.zip file. - posted by Siddharth Ubale <si...@syncoms.com> on 2016/01/15 14:58:28 UTC, 4 replies.
- Stacking transformations and using intermediate results in the next transformation - posted by Richard Siebeling <rs...@gmail.com> on 2016/01/15 15:27:16 UTC, 0 replies.
- jobs much slower in cluster mode vs local - posted by Sa...@wellsfargo.com on 2016/01/15 15:28:46 UTC, 3 replies.
- Serialization stack error - posted by beeshma r <be...@gmail.com> on 2016/01/15 15:37:30 UTC, 3 replies.
- Feature importance for RandomForestRegressor in Spark 1.5 - posted by Scott Imig <si...@richrelevance.com> on 2016/01/15 17:06:28 UTC, 2 replies.
- Spark Streaming: routing by key without groupByKey - posted by Lin Zhao <li...@exabeam.com> on 2016/01/15 18:48:19 UTC, 0 replies.
- has any one implemented TF_IDF using ML transformers? - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/01/15 23:10:29 UTC, 0 replies.
- Multi tenancy, REST and MLlib - posted by feribg <fe...@gmail.com> on 2016/01/16 00:00:44 UTC, 1 replies.
- spark.master overwritten in standalone plus cluster deploy-mode - posted by shanson <sh...@bloomberg.net> on 2016/01/16 00:48:36 UTC, 0 replies.
- How To Save TF-IDF Model In PySpark - posted by Asim Jalis <as...@gmail.com> on 2016/01/16 01:02:21 UTC, 2 replies.
- Consuming commands from a queue - posted by "Afshartous, Nick" <na...@turbine.com> on 2016/01/16 01:25:22 UTC, 2 replies.
- Compiling only MLlib? - posted by Colin Woodbury <co...@gmail.com> on 2016/01/16 03:13:24 UTC, 2 replies.
- Executor initialize before all resources are ready - posted by Byron Wang <op...@gmail.com> on 2016/01/16 04:08:56 UTC, 1 replies.
- Spark streaming: Fixed time aggregation & handling driver failures - posted by ffarozan <ff...@gmail.com> on 2016/01/16 04:13:46 UTC, 2 replies.
- 答复: 答复: 答复: 答复: spark streaming context trigger invoke stop why? - posted by "Triones,Deng(vip.com)" <tr...@vipshop.com> on 2016/01/16 09:02:55 UTC, 1 replies.
- spark job server - posted by Madabhattula Rajesh Kumar <mr...@gmail.com> on 2016/01/16 14:33:52 UTC, 2 replies.
- ClassNotFoundException interpreting a Spark job - posted by milad bourhani <mi...@gmail.com> on 2016/01/16 15:24:47 UTC, 0 replies.
- How to apply mapPartitionsWithIndex to an emptyRDD? - posted by LINChen <m2...@outlook.com> on 2016/01/16 16:52:00 UTC, 0 replies.
- Re: has any one implemented TF_IDF using ML transformers? - posted by Yanbo Liang <yb...@gmail.com> on 2016/01/17 09:34:10 UTC, 4 replies.
- Converting CSV files to Avro - posted by Gideon <gi...@volcanodata.com> on 2016/01/17 12:46:34 UTC, 1 replies.
- Incorrect timeline for Scheduling Delay in Streaming page in web UI? - posted by Jacek Laskowski <ja...@japila.pl> on 2016/01/17 13:49:30 UTC, 2 replies.
- How to tunning my spark application. - posted by 张峻 <ju...@me.com> on 2016/01/17 15:22:33 UTC, 3 replies.
- Reuse Executor JVM across different JobContext - posted by Jia Zou <ja...@gmail.com> on 2016/01/17 16:29:05 UTC, 12 replies.
- Spark Streaming: BatchDuration and Processing time - posted by pyspark2555 <sc...@gmail.com> on 2016/01/17 17:32:54 UTC, 3 replies.
- Spark Streaming: Does mapWithState implicitly partition the dsteram? - posted by Lin Zhao <li...@exabeam.com> on 2016/01/17 20:00:15 UTC, 1 replies.
- Running out of memory locally launching multiple spark jobs using spark yarn / submit from shell script. - posted by Colin Kincaid Williams <di...@uw.edu> on 2016/01/17 21:58:59 UTC, 0 replies.
- Monitoring Spark with Ganglia on ElCapo - posted by william tellme <wi...@gmail.com> on 2016/01/17 22:38:05 UTC, 0 replies.
- Total Task size exception in Spark 1.6.0 when writing a DataFrame - posted by Night Wolf <ni...@gmail.com> on 2016/01/18 06:07:20 UTC, 0 replies.
- Spark + Sentry + Kerberos don't add up? - posted by Ruslan Dautkhanov <da...@gmail.com> on 2016/01/18 07:04:17 UTC, 2 replies.
- Extracting p values in Logistic regression using mllib scala - posted by Chandan Verma <ch...@citiustech.com> on 2016/01/18 09:45:02 UTC, 1 replies.
- Re: Spark Streaming on mesos - posted by Iulian Dragoș <iu...@typesafe.com> on 2016/01/18 11:08:57 UTC, 0 replies.
- Is there a test like MiniCluster example in Spark just like hadoop ? - posted by zml张明磊 <mi...@Ctrip.com> on 2016/01/18 11:14:11 UTC, 0 replies.
- spark 1.6.0 on ec2 doesn't work - posted by Oleg Ruchovets <or...@gmail.com> on 2016/01/18 11:51:56 UTC, 8 replies.
- How to call a custom function from GroupByKey which takes Iterable[Row] as input and returns a Map[Int,String] as output in scala - posted by Neha Mehta <ne...@gmail.com> on 2016/01/18 12:47:11 UTC, 3 replies.
- Calling SparkContext methods in scala Future - posted by Marco <ma...@gmail.com> on 2016/01/18 15:27:57 UTC, 4 replies.
- Re: spark-1.2.0--standalone-ha-zookeeper - posted by doctorx <ra...@gmail.com> on 2016/01/18 15:47:42 UTC, 5 replies.
- Re: Is there a test like MiniCluster example in Spark just like hadoop ? - posted by Ted Yu <yu...@gmail.com> on 2016/01/18 16:13:00 UTC, 0 replies.
- spark ml Dataframe vs Labeled Point RDD Mllib speed - posted by jarias <ja...@elrocin.es> on 2016/01/18 16:46:34 UTC, 0 replies.
- spark random forest regressor : argument minInstancesPerNode not accepted - posted by Christopher Bourez <ch...@gmail.com> on 2016/01/18 16:56:25 UTC, 0 replies.
- [Spark-SQL] from_unixtime with user-specified timezone - posted by Jerry Lam <ch...@gmail.com> on 2016/01/18 18:39:20 UTC, 3 replies.
- Spark SQL create table - posted by raghukiran <ra...@gmail.com> on 2016/01/18 18:57:50 UTC, 6 replies.
- Spark Summit East - Full Schedule Available - posted by Scott walent <sc...@gmail.com> on 2016/01/18 19:55:05 UTC, 0 replies.
- Number of CPU cores for a Spark Streaming app in Standalone mode - posted by radoburansky <ra...@gmail.com> on 2016/01/18 21:13:06 UTC, 2 replies.
- using spark context in map funciton TASk not serilizable error - posted by gpatcham <gp...@gmail.com> on 2016/01/18 21:29:10 UTC, 9 replies.
- Contrib to Docs: Re: SparkContext SyntaxError: invalid syntax - posted by Jim Lohse <sp...@megalearningllc.com> on 2016/01/18 21:46:28 UTC, 0 replies.
- trouble using eclipse to view spark source code - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/01/18 22:04:15 UTC, 3 replies.
- building spark 1.6 throws error Rscript: command not found - posted by Mich Talebzadeh <mi...@peridale.co.uk> on 2016/01/18 22:22:49 UTC, 2 replies.
- is recommendProductsForUsers available in ALS? - posted by Roberto Pagliari <ro...@asos.com> on 2016/01/18 23:18:32 UTC, 1 replies.
- PySpark Broadcast of User Defined Class No Work? - posted by efwalkermit <ef...@alum.mit.edu> on 2016/01/19 00:34:47 UTC, 1 replies.
- Spark Streaming - Latest batch-time can't keep up with current time - posted by Collin Shi <sh...@aliyun.com> on 2016/01/19 04:18:35 UTC, 0 replies.
- rdd.foreach return value - posted by charles li <ch...@gmail.com> on 2016/01/19 04:34:06 UTC, 8 replies.
- SparkR with Hive integration - posted by Peter Zhang <zh...@gmail.com> on 2016/01/19 05:23:57 UTC, 3 replies.
- when enable kerberos in hdp, the spark does not work - posted by 李振 <li...@163.com> on 2016/01/19 08:39:58 UTC, 1 replies.
- Re: Spark 1.6.0, yarn-shuffle - posted by johd <jo...@svenskaspel.se> on 2016/01/19 08:40:06 UTC, 0 replies.
- a problem about using UDF at sparksql - posted by 喜之郎 <25...@qq.com> on 2016/01/19 08:56:57 UTC, 0 replies.
- how to save matrix result to file - posted by zhangjp <59...@qq.com> on 2016/01/19 09:33:20 UTC, 0 replies.
- Different executor memory for different nodes - posted by hemangshah <he...@gmail.com> on 2016/01/19 12:20:50 UTC, 0 replies.
- storing query object - posted by Gourav Sengupta <go...@gmail.com> on 2016/01/19 12:24:44 UTC, 4 replies.
- Spark Dataset doesn't have api for changing columns - posted by Milad khajavi <kh...@gmail.com> on 2016/01/19 12:42:35 UTC, 2 replies.
- spark yarn client mode - posted by Sanjeev Verma <sa...@gmail.com> on 2016/01/19 12:43:45 UTC, 1 replies.
- RDD immutablility - posted by ddav <da...@gmail.com> on 2016/01/19 13:14:58 UTC, 5 replies.
- Parquet write optimization by row group size config - posted by Pavel Plotnikov <pa...@team.wrike.com> on 2016/01/19 13:43:04 UTC, 5 replies.
- Is there a way to co-locate partitions from two partitioned RDDs? - posted by nwali <no...@utbm.fr> on 2016/01/19 13:55:56 UTC, 0 replies.
- Can I configure Spark on multiple nodes using local filesystem on each node? - posted by Jia Zou <ja...@gmail.com> on 2016/01/19 15:39:47 UTC, 1 replies.
- Split columns in RDD - posted by Richard Siebeling <rs...@gmail.com> on 2016/01/19 16:17:51 UTC, 6 replies.
- can we create dummy variables from categorical variables, using sparkR - posted by Devesh Raj Singh <ra...@gmail.com> on 2016/01/19 16:34:07 UTC, 2 replies.
- RangePartitioning - posted by ddav <da...@gmail.com> on 2016/01/19 17:27:26 UTC, 0 replies.
- Concurrent Spark jobs - posted by emlyn <em...@swiftkey.com> on 2016/01/19 17:58:42 UTC, 4 replies.
- Appending filename information to RDD initialized by sc.textFile - posted by Femi Anthony <fe...@gmail.com> on 2016/01/19 19:18:59 UTC, 2 replies.
- Spark SQL -Hive transactions support - posted by hnagar <he...@mobiusws.com> on 2016/01/19 20:32:18 UTC, 3 replies.
- dataframe access hive complex type - posted by pth001 <Pa...@uni.no> on 2016/01/19 21:18:10 UTC, 0 replies.
- Re: Docker/Mesos with Spark - posted by Sathish Kumaran Vairavelu <vs...@gmail.com> on 2016/01/19 21:28:17 UTC, 4 replies.
- GraphX: Easy way to build fully connected grid-graph - posted by "benjamin.naujoks" <na...@gmail.com> on 2016/01/19 21:50:42 UTC, 0 replies.
- OOM on yarn-cluster mode - posted by Julio Antonio Soto <ju...@esbet.es> on 2016/01/19 22:15:20 UTC, 2 replies.
- is Hbase Scan really need thorough Get (Hbase+solr+spark) - posted by beeshma r <be...@gmail.com> on 2016/01/20 00:09:49 UTC, 3 replies.
- spark dataframe jdbc read/write using dbcp connection pool - posted by "fightfate@163.com" <fi...@163.com> on 2016/01/20 03:11:03 UTC, 5 replies.
- process of executing a program in a distributed environment without hadoop - posted by Kamaruddin <sk...@gmail.com> on 2016/01/20 06:45:23 UTC, 2 replies.
- Redundant common columns of nature full outer join - posted by Zhong Wang <wa...@gmail.com> on 2016/01/20 07:51:27 UTC, 1 replies.
- How to use scala.math.Ordering in java - posted by ddav <da...@gmail.com> on 2016/01/20 10:03:35 UTC, 2 replies.
- How to query data in tachyon with spark-sql - posted by Sea <26...@qq.com> on 2016/01/20 11:06:45 UTC, 0 replies.
- Container exited with a non-zero exit code 1-SparkJOb on YARN - posted by Siddharth Ubale <si...@syncoms.com> on 2016/01/20 13:29:47 UTC, 3 replies.
- Scala MatchError in Spark SQL - posted by raghukiran <ra...@gmail.com> on 2016/01/20 15:07:42 UTC, 8 replies.
- Cache table as - posted by Younes Naguib <Yo...@tritondigital.com> on 2016/01/20 16:18:18 UTC, 0 replies.
- Dataframe, Spark SQL - Drops First 8 Characters of String on Amazon EMR - posted by awzurn <aw...@gmail.com> on 2016/01/20 16:35:40 UTC, 5 replies.
- updateStateByKey not persisting in Spark 1.5.1 - posted by Brian London <br...@gmail.com> on 2016/01/20 16:55:13 UTC, 4 replies.
- Getting all field value as Null while reading Hive Table with Partition - posted by Bijay Pathak <bi...@cloudwick.com> on 2016/01/20 17:25:50 UTC, 0 replies.
- launching app using SparkLauncher - posted by se...@nomura.com on 2016/01/20 17:56:02 UTC, 0 replies.
- Using Spark, SparkR and Ranger, please help. - posted by Julien Carme <ju...@gmail.com> on 2016/01/20 18:42:06 UTC, 1 replies.
- I need help mapping a PairRDD solution to Dataset - posted by Steve Lewis <lo...@gmail.com> on 2016/01/20 19:26:37 UTC, 3 replies.
- How to debug join operations on a cluster. - posted by Borislav Iordanov <bi...@liquidoperations.com> on 2016/01/20 19:31:19 UTC, 0 replies.
- Looking for the best tool that support structured DB and fast text indexing and searching with Spark - posted by Khaled Al-Gumaei <kh...@gmail.com> on 2016/01/20 19:32:41 UTC, 0 replies.
- visualize data from spark streaming - posted by patcharee <Pa...@uni.no> on 2016/01/20 20:54:11 UTC, 3 replies.
- [Spark Streaming][Problem with DataFrame UDFs] - posted by jpocalan <jp...@gmail.com> on 2016/01/20 22:53:37 UTC, 3 replies.
- trouble implementing complex transformer in java that can be used with Pipeline. Scala to Java porting problem - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/01/21 01:05:56 UTC, 1 replies.
- Re: trouble implementing complex transformer in java that can be used with Pipeline. Scala to Java porting problem - posted by Kevin Mellott <ke...@gmail.com> on 2016/01/21 01:34:51 UTC, 1 replies.
- retrieve cell value from a rowMatrix. - posted by Srivathsan Srinivas <sr...@gmail.com> on 2016/01/21 02:04:31 UTC, 1 replies.
- HBase 0.98.0 with Spark 1.5.3 issue in yarn-cluster mode - posted by Ajinkya Kale <ka...@gmail.com> on 2016/01/21 02:41:27 UTC, 8 replies.
- --driver-java-options not support multiple JVM configuration ? - posted by "ouruia@cnsuning.com" <ou...@cnsuning.com> on 2016/01/21 04:38:34 UTC, 0 replies.
- Re: --driver-java-options not support multiple JVM configuration ? - posted by Marcelo Vanzin <va...@cloudera.com> on 2016/01/21 05:09:26 UTC, 2 replies.
- best practice : how to manage your Spark cluster ? - posted by charles li <ch...@gmail.com> on 2016/01/21 06:33:56 UTC, 1 replies.
- Re: spark task scheduling delay - posted by Renu Yadav <yr...@gmail.com> on 2016/01/21 06:38:21 UTC, 1 replies.
- 回复：retrieve cell value from a rowMatrix. - posted by zhangjp <59...@qq.com> on 2016/01/21 07:08:05 UTC, 0 replies.
- a lot of warnings when build spark 1.6.0 - posted by Eli Super <el...@gmail.com> on 2016/01/21 08:08:30 UTC, 2 replies.
- Passing binding variable in query used in Data Source API - posted by satish chandra j <js...@gmail.com> on 2016/01/21 12:02:37 UTC, 2 replies.
- Spark Streaming Write Ahead Log (WAL) not replaying data after restart - posted by Patrick McGloin <mc...@gmail.com> on 2016/01/21 12:32:27 UTC, 2 replies.
- Number of executors in Spark - Kafka - posted by Guillermo Ortiz <ko...@gmail.com> on 2016/01/21 12:35:17 UTC, 1 replies.
- spark job submisson on yarn-cluster mode failing - posted by Soni spark <so...@gmail.com> on 2016/01/21 12:41:55 UTC, 4 replies.
- Spark 1.6 ignoreNulls in first/last aggregate functions - posted by emlyn <em...@swiftkey.com> on 2016/01/21 13:31:14 UTC, 1 replies.
- question about query SparkSQL - posted by Eli Super <el...@gmail.com> on 2016/01/21 13:54:13 UTC, 0 replies.
- How to setup a long running spark streaming job with continuous window refresh - posted by Santoshakhilesh <sa...@huawei.com> on 2016/01/21 13:59:16 UTC, 1 replies.
- Spark Yarn executor memory overhead content - posted by Olivier Devoisin <ol...@content-square.fr> on 2016/01/21 14:42:29 UTC, 1 replies.
- Client versus cluster mode - posted by "Afshartous, Nick" <na...@turbine.com> on 2016/01/21 14:53:28 UTC, 1 replies.
- cast column string -> timestamp in Parquet file - posted by Eli Super <el...@gmail.com> on 2016/01/21 15:17:03 UTC, 2 replies.
- Spark job stops after a while. - posted by Guillermo Ortiz <ko...@gmail.com> on 2016/01/21 15:50:21 UTC, 5 replies.
- java.lang.ArrayIndexOutOfBoundsException when attempting broadcastjoin - posted by "sebastian.piu" <se...@gmail.com> on 2016/01/21 15:59:15 UTC, 2 replies.
- 10hrs of Scheduler Delay - posted by "Sanders, Isaac B" <sa...@rose-hulman.edu> on 2016/01/21 16:35:01 UTC, 20 replies.
- No plan for BroadcastHint when attempting broadcastjoin - posted by Ted Yu <yu...@gmail.com> on 2016/01/21 16:36:24 UTC, 1 replies.
- Recovery for Spark Streaming Kafka Direct with OffsetOutOfRangeException - posted by Dan Dutrow <da...@gmail.com> on 2016/01/21 18:11:15 UTC, 1 replies.
- [ANNOUNCE] Apache Nutch 2.3.1 Release - posted by lewis john mcgibbney <le...@apache.org> on 2016/01/21 18:37:51 UTC, 0 replies.
- Date / time stuff with spark. - posted by Andrew Holway <an...@otternetworks.de> on 2016/01/21 20:24:58 UTC, 5 replies.
- TaskCommitDenied (Driver denied task commit) - posted by Arun Luthra <ar...@gmail.com> on 2016/01/21 23:02:56 UTC, 9 replies.
- Getting Co-oefficients of a logistic regression model for a pipelinemodel Spark ML library - posted by Vinayak Agrawal <vi...@gmail.com> on 2016/01/21 23:05:33 UTC, 1 replies.
- MemoryStore: Not enough space to cache broadcast_N in memory - posted by Arun Luthra <ar...@gmail.com> on 2016/01/21 23:10:28 UTC, 1 replies.
- General Question (Spark Hive integration ) - posted by "Balaraju.Kagidala Kagidala" <ba...@gmail.com> on 2016/01/22 03:37:41 UTC, 3 replies.
- Spark partition size tuning - posted by Jia Zou <ja...@gmail.com> on 2016/01/22 05:05:30 UTC, 4 replies.
- avg(df$column) not returning a value but just the text "Column avg" - posted by Devesh Raj Singh <ra...@gmail.com> on 2016/01/22 07:23:59 UTC, 0 replies.
- Spark Streaming - Custom ReceiverInputDStream ( Custom Source) In java - posted by Nagu Kothapalli <na...@gmail.com> on 2016/01/22 08:12:37 UTC, 2 replies.
- spark-streaming with checkpointing: error with sparkOnHBase lib - posted by vinay gupta <vi...@yahoo.com.INVALID> on 2016/01/22 09:36:01 UTC, 1 replies.
- [Streaming-Kafka] How to start from topic offset when streamcontext is using checkpoint - posted by Raju Bairishetti <ra...@apache.org> on 2016/01/22 10:03:23 UTC, 5 replies.
- SparkR works from command line but not from rstudio - posted by Sandeep Khurana <sa...@infoworks.io> on 2016/01/22 12:05:15 UTC, 2 replies.
- Application SUCCESS/FAILURE status using spark API - posted by Raghvendra Singh <ra...@gmail.com> on 2016/01/22 12:47:18 UTC, 0 replies.
- spark streaming input rate strange - posted by patcharee <Pa...@uni.no> on 2016/01/22 13:14:55 UTC, 1 replies.
- 回复： retrieve cell value from a rowMatrix. - posted by zhangjp <59...@qq.com> on 2016/01/22 13:56:36 UTC, 0 replies.
- Spark Streaming : requirement failed: numRecords must not be negative - posted by "Afshartous, Nick" <na...@turbine.com> on 2016/01/22 16:31:49 UTC, 2 replies.
- looking for a spark admin consultant/contractor - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/01/22 18:14:53 UTC, 0 replies.
- Tool for Visualization /Plotting of K means cluster - posted by Ashutosh Kumar <km...@gmail.com> on 2016/01/22 18:29:08 UTC, 0 replies.
- Help understanding DAG for mapWithState - posted by Lin Zhao <li...@exabeam.com> on 2016/01/22 18:50:53 UTC, 0 replies.
- Disable speculative retry only for specific stages? - posted by Adam McElwee <ad...@mcelwee.me> on 2016/01/22 19:15:41 UTC, 1 replies.
- StackOverflow when computing MatrixFactorizationModel.recommendProductsForUsers - posted by Ram VISWANADHA <ra...@dailymotion.com> on 2016/01/22 19:25:41 UTC, 1 replies.
- Use KafkaRDD to Batch Process Messages from Kafka - posted by Charles Chao <ch...@bluecava.com> on 2016/01/22 20:30:11 UTC, 2 replies.
- First job is extremely slow due to executor heartbeat timeout (yarn-client) - posted by Zhong Wang <wa...@gmail.com> on 2016/01/22 22:09:58 UTC, 0 replies.
- Trouble dropping columns from a DataFrame that has other columns with dots in their names - posted by Joshua TAYLOR <jo...@gmail.com> on 2016/01/22 23:57:24 UTC, 6 replies.
- Spark Cassandra clusters - posted by vi...@wipro.com on 2016/01/23 02:37:48 UTC, 12 replies.
- Spark LDA - posted by Ilya Ganelin <il...@gmail.com> on 2016/01/23 02:52:13 UTC, 0 replies.
- Caching in Spark - posted by Sourabh Chandak <so...@gmail.com> on 2016/01/23 03:10:36 UTC, 0 replies.
- How to send a file to database using spark streaming - posted by Sree Eedupuganti <sr...@inndata.in> on 2016/01/23 07:49:53 UTC, 1 replies.
- concurrent.RejectedExecutionException - posted by Yasemin Kaya <go...@gmail.com> on 2016/01/23 09:51:43 UTC, 1 replies.
- Spark not saving data to Hive - posted by Akhilesh Pathodia <pa...@gmail.com> on 2016/01/23 12:03:43 UTC, 0 replies.
- python - list objects in HDFS directory - posted by Andrew Holway <an...@otternetworks.de> on 2016/01/23 13:08:25 UTC, 1 replies.
- Re: Does filter on an RDD scan every data item ? - posted by nir <ni...@gmail.com> on 2016/01/23 14:26:08 UTC, 1 replies.
- How to efficiently Scan (not filter nor lookup) part of Paird RDD or Ordered RDD - posted by Nirav Patel <np...@xactlycorp.com> on 2016/01/23 14:48:41 UTC, 2 replies.
- Spark not writing data in Hive format - posted by Akhilesh Pathodia <pa...@gmail.com> on 2016/01/23 17:59:34 UTC, 0 replies.
- Clarification on Data Frames joins - posted by Madabhattula Rajesh Kumar <mr...@gmail.com> on 2016/01/23 18:54:20 UTC, 1 replies.
- Concatenating tables - posted by Andrew Holway <an...@otternetworks.de> on 2016/01/23 22:02:09 UTC, 2 replies.
- Spark RDD DAG behaviour understanding in case of checkpointing - posted by gaurav sharma <sh...@gmail.com> on 2016/01/23 22:47:58 UTC, 2 replies.
- Debug what is replication Level of which RDD - posted by gaurav sharma <sh...@gmail.com> on 2016/01/23 22:55:30 UTC, 1 replies.
- understanding iterative algorithms in Spark - posted by Raghava Mutharaju <m....@gmail.com> on 2016/01/24 01:48:01 UTC, 1 replies.
- Spark master takes more time with local[8] than local[1] - posted by jimitkr <ji...@softpath.net> on 2016/01/24 21:11:12 UTC, 2 replies.
- high CPU usage for acceptor and qtp threads - posted by "alberto.scolari" <al...@polimi.it> on 2016/01/25 01:56:09 UTC, 0 replies.
- show to save Matrix type result to hdfs file using java - posted by zhangjp <59...@qq.com> on 2016/01/25 03:34:50 UTC, 0 replies.
- Group by Dynamically - posted by Divya Gehlot <di...@gmail.com> on 2016/01/25 03:34:59 UTC, 0 replies.
- how to save Matrix type result to hdfs file using java - posted by zhangjp <59...@qq.com> on 2016/01/25 03:44:25 UTC, 0 replies.
- Re: How to query data in tachyon with spark-sql - posted by Gene Pang <ge...@gmail.com> on 2016/01/25 05:44:30 UTC, 0 replies.
- Re: how to save Matrix type result to hdfs file using java - posted by Yanbo Liang <yb...@gmail.com> on 2016/01/25 06:31:55 UTC, 0 replies.
- 回复： how to save Matrix type result to hdfs file using java - posted by zhangjp <59...@qq.com> on 2016/01/25 07:53:37 UTC, 0 replies.
- NA value handling in sparkR - posted by Devesh Raj Singh <ra...@gmail.com> on 2016/01/25 08:05:46 UTC, 7 replies.
- Worker's BlockManager Folder not getting cleared - posted by Abhishek Anand <ab...@gmail.com> on 2016/01/25 08:14:48 UTC, 2 replies.
- how to build spark with out hive - posted by kevin <ki...@gmail.com> on 2016/01/25 09:19:23 UTC, 1 replies.
- RangePartitioning skewed data - posted by jluan <ja...@gmail.com> on 2016/01/25 09:46:48 UTC, 0 replies.
- SparkSQL : "select non null values from column" - posted by Eli Super <el...@gmail.com> on 2016/01/25 11:00:08 UTC, 1 replies.
- How to discretize Continuous Variable with Spark DataFrames - posted by Eli Super <el...@gmail.com> on 2016/01/25 11:34:18 UTC, 3 replies.
- SparkSQL return all null fields when FIELDS TERMINATED BY '\t' and have a partition. - posted by Liu Yiding <od...@gmail.com> on 2016/01/25 12:50:51 UTC, 0 replies.
- Undefined job output-path error in Spark on hive - posted by Akhilesh Pathodia <pa...@gmail.com> on 2016/01/25 13:00:28 UTC, 0 replies.
- Getting top distinct strings from arraylist - posted by Patrick Plaatje <pa...@bazana.com> on 2016/01/25 13:21:18 UTC, 0 replies.
- [Spark] Reading avro file in Spark 1.3.0 - posted by diplomatic Guru <di...@gmail.com> on 2016/01/25 13:38:18 UTC, 1 replies.
- Re: Launching EC2 instances with Spark compiled for Scala 2.11 - posted by Nuno Santos <nf...@gmail.com> on 2016/01/25 13:38:40 UTC, 1 replies.
- streaming textFileStream problem - got only ONE line - posted by patcharee <Pa...@uni.no> on 2016/01/25 15:30:40 UTC, 3 replies.
- hivethriftserver2 problems on upgrade to 1.6.0 - posted by "james.green9@baesystems.com" <ja...@baesystems.com> on 2016/01/25 16:06:25 UTC, 0 replies.
- Sharing HiveContext in Spark JobServer / getOrCreate - posted by Deenar Toraskar <de...@gmail.com> on 2016/01/25 16:22:59 UTC, 2 replies.
- bug for large textfiles on windows - posted by Christopher Bourez <ch...@gmail.com> on 2016/01/25 16:53:09 UTC, 5 replies.
- Determine Topic MetaData Spark Streaming Job - posted by Ashish Soni <as...@gmail.com> on 2016/01/25 17:31:51 UTC, 3 replies.
- a question about web ui log - posted by Philip Lee <ph...@gmail.com> on 2016/01/25 17:36:56 UTC, 4 replies.
- Spark DataFrame Catalyst - Another Oracle like query optimizer? - posted by Nirav Patel <np...@xactlycorp.com> on 2016/01/25 18:35:16 UTC, 2 replies.
- Running kafka consumer in local mode - error - connection timed out - posted by Supreeth <su...@gmail.com> on 2016/01/25 19:46:38 UTC, 3 replies.
- mapWithState and context start when checkpoint exists - posted by Andrey Yegorov <an...@gmail.com> on 2016/01/25 22:12:16 UTC, 2 replies.
- Datasets and columns - posted by Steve Lewis <lo...@gmail.com> on 2016/01/25 22:16:43 UTC, 3 replies.
- Can Spark read input data from HDFS centralized cache? - posted by Jia Zou <ja...@gmail.com> on 2016/01/25 22:23:43 UTC, 2 replies.
- Standalone scheduler issue - one job occupies the whole cluster somehow - posted by Mikhail Strebkov <st...@gmail.com> on 2016/01/25 22:57:24 UTC, 0 replies.
- Generic Dataset Aggregator - posted by Deenar Toraskar <de...@gmail.com> on 2016/01/25 23:36:35 UTC, 1 replies.
- MLlib OneVsRest causing intermittent exceptions - posted by David Brooks <da...@whisk.co.uk> on 2016/01/26 00:06:28 UTC, 9 replies.
- [Spark Streaming] What determines the processing parallelism of DStream in SparkStreaming - posted by Collin Shi <sh...@aliyun.com> on 2016/01/26 04:10:49 UTC, 0 replies.
- spark-sql[1.4.0] not compatible hive sql when using in with date_sub or regexp_replace - posted by "ouruia@cnsuning.com" <ou...@cnsuning.com> on 2016/01/26 04:44:57 UTC, 0 replies.
- multi-threaded Spark jobs - posted by Elango Cheran <el...@gmail.com> on 2016/01/26 06:59:47 UTC, 2 replies.
- Why does DStream have a different StorageLevel than RDD ? - posted by "Sela, Amit" <AN...@paypal.com.INVALID> on 2016/01/26 09:02:09 UTC, 0 replies.
- hive1.2.1 on spark 1.5.2 - posted by kevin <ki...@gmail.com> on 2016/01/26 09:45:18 UTC, 0 replies.
- Write to S3 with server side encryption in KMS mode - posted by Nisrina Luthfiyati <ni...@gmail.com> on 2016/01/26 11:41:44 UTC, 2 replies.
- Regarding Off-heap memory - posted by Xiaoyu Ma <hz...@corp.netease.com> on 2016/01/26 13:20:24 UTC, 1 replies.
- Re: Spark task hangs infinitely when accessing S3 from AWS - posted by Erisa Dervishi <er...@gmail.com> on 2016/01/26 13:41:41 UTC, 7 replies.
- Re: cartesian in the loop, runtime grows - posted by efa <je...@gmail.com> on 2016/01/26 14:12:19 UTC, 0 replies.
- py4j.protocol.Py4JJavaError when selecting nested column in dataframe using select statetment - posted by Lior Baber <li...@gettaxi.com> on 2016/01/26 14:51:05 UTC, 0 replies.
- How to migrate spark code to spark streaming ? - posted by Sree Eedupuganti <sr...@inndata.in> on 2016/01/26 15:10:40 UTC, 0 replies.
- RE: how to correctly run scala script using spark-shell through stdin (spark v1.0.0) - posted by fernandrez1987 <an...@wellsfargo.com> on 2016/01/26 15:47:34 UTC, 7 replies.
- Spark ODBC Driver Windows Desktop problem - posted by Я <ma...@bk.ru> on 2016/01/26 16:04:35 UTC, 1 replies.
- org.netezza.error.NzSQLException: ERROR: Invalid datatype - TEXT - posted by "Kali.tummala@gmail.com" <Ka...@gmail.com> on 2016/01/26 16:26:25 UTC, 1 replies.
- Re: org.netezza.error.NzSQLException: ERROR: Invalid datatype - TEXT - posted by Ted Yu <yu...@gmail.com> on 2016/01/26 16:49:59 UTC, 1 replies.
- ctas fails with "No plan for CreateTableAsSelect" - posted by Younes Naguib <Yo...@tritondigital.com> on 2016/01/26 17:00:54 UTC, 8 replies.
- Re: Off-heap memory usage of Spark Executors keeps increasing - posted by nir <ni...@gmail.com> on 2016/01/26 17:31:43 UTC, 0 replies.
- Stage shows incorrect output size - posted by Noorul Islam K M <no...@noorul.com> on 2016/01/26 17:40:42 UTC, 0 replies.
- NPE from sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply? - posted by Jacek Laskowski <ja...@japila.pl> on 2016/01/26 18:15:02 UTC, 2 replies.
- NoSuchMethod from transitive dependency jackson-databind in MaxMind GeoIP2 - posted by asdf zxcv <be...@gmail.com> on 2016/01/26 18:34:57 UTC, 5 replies.
- Scala closure exceeds ByteArrayOutputStream limit (~2gb) - posted by Joel Keller <jk...@miovision.com> on 2016/01/26 18:36:14 UTC, 0 replies.
- FAIR scheduler in Spark Streaming - posted by Sebastian Piu <se...@gmail.com> on 2016/01/26 18:57:57 UTC, 2 replies.
- Terminating Spark Steps in AWS - posted by Daniel Imberman <da...@gmail.com> on 2016/01/26 19:05:11 UTC, 2 replies.
- Need a sample code to load XML files into cassandra database using spark streaming - posted by Sree Eedupuganti <sr...@inndata.in> on 2016/01/26 19:10:12 UTC, 1 replies.
- Window range in Spark - posted by Krishna <re...@gmail.com> on 2016/01/26 19:49:45 UTC, 0 replies.
- Issue with spark-shell in yarn mode - posted by nd...@gmail.com on 2016/01/26 20:02:44 UTC, 0 replies.
- save rdd with gzip compresson but without .gz extension? - posted by Alexander Pivovarov <ap...@gmail.com> on 2016/01/26 20:09:17 UTC, 0 replies.
- Databricks Cloud vs AWS EMR - posted by Alex Nastetsky <al...@vervemobile.com> on 2016/01/26 20:55:41 UTC, 4 replies.
- Spark GraphX + TitanDB + Cassandra? - posted by Joe Bako <jb...@gracenote.com> on 2016/01/26 21:19:22 UTC, 1 replies.
- Spark SQL joins taking too long - posted by raghukiran <ra...@gmail.com> on 2016/01/26 21:41:16 UTC, 5 replies.
- withColumn - posted by naga sharathrayapati <sh...@gmail.com> on 2016/01/26 22:04:57 UTC, 2 replies.
- Issues with Long subtraction in an RDD when utilising tailrecursion - posted by Nkechi Achara <nk...@googlemail.com> on 2016/01/26 22:10:03 UTC, 2 replies.
- Spark Pattern and Anti-Pattern - posted by Daniel Schulz <da...@hotmail.com> on 2016/01/26 22:25:34 UTC, 1 replies.
- Spark 2.0.0 release plan - posted by Koert Kuipers <ko...@tresata.com> on 2016/01/26 23:00:39 UTC, 8 replies.
- Re: newAPIHadoopFile uses AWS credentials from other threads - posted by Wayne Song <wa...@gmail.com> on 2016/01/26 23:49:00 UTC, 0 replies.
- Re: Spark LDA model reuse with new set of data - posted by Joseph Bradley <jo...@databricks.com> on 2016/01/27 00:44:33 UTC, 0 replies.
- Spark, Mesos, Docker and S3 - posted by Mao Geng <ma...@sumologic.com> on 2016/01/27 01:02:55 UTC, 11 replies.
- naive bayes results to not match published results - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/01/27 01:30:01 UTC, 0 replies.
- SQL - posted by Madabhattula Rajesh Kumar <mr...@gmail.com> on 2016/01/27 04:37:46 UTC, 0 replies.
- How to debug - posted by Anfernee Xu <an...@gmail.com> on 2016/01/27 05:18:41 UTC, 0 replies.
- How to debug ClassCastException: java.lang.String cannot be cast to java.lang.Long in SparkSQL - posted by Anfernee Xu <an...@gmail.com> on 2016/01/27 05:30:18 UTC, 1 replies.
- Re: mllib.remenmender.als issue - posted by Xiangrui Meng <me...@databricks.com> on 2016/01/27 05:54:02 UTC, 0 replies.
- Streaming: mapWithState "Error during Java deserialization." - posted by Lin Zhao <li...@exabeam.com> on 2016/01/27 06:27:53 UTC, 0 replies.
- ZlibFactor warning - posted by Eli Super <el...@gmail.com> on 2016/01/27 10:29:54 UTC, 1 replies.
- TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2 - posted by Jia Zou <ja...@gmail.com> on 2016/01/27 12:02:27 UTC, 5 replies.
- spark.kryo.classesToRegister - posted by amit tewari <am...@gmail.com> on 2016/01/27 12:13:55 UTC, 3 replies.
- [Problem Solved]Re: Spark partition size tuning - posted by Jia Zou <ja...@gmail.com> on 2016/01/27 12:16:58 UTC, 0 replies.
- Re: Generate Amplab queries set - posted by Akhil Das <ak...@sigmoidanalytics.com> on 2016/01/27 13:02:38 UTC, 0 replies.
- JSON to SQL - posted by Andrés Ivaldi <ia...@gmail.com> on 2016/01/27 14:33:09 UTC, 10 replies.
- help with enabling spark dynamic allocation - posted by varuni gang <va...@gmail.com> on 2016/01/27 16:00:25 UTC, 1 replies.
- Having issue with Spark SQL JDBC on hive table !!! - posted by "@Sanjiv Singh" <sa...@gmail.com> on 2016/01/27 16:07:23 UTC, 5 replies.
- Saving a pipeline model ? - posted by Vinayak Agrawal <vi...@gmail.com> on 2016/01/27 16:40:02 UTC, 0 replies.
- Neo4j and Spark/GraphX - posted by Sahil Sareen <sa...@gmail.com> on 2016/01/27 17:11:16 UTC, 0 replies.
- Storing JavaDStream into a hive table - posted by samrat <ye...@gmail.com> on 2016/01/27 17:27:14 UTC, 0 replies.
- how to run latest version of spark in old version of spark in cloudera cluster ? - posted by "Kali.tummala@gmail.com" <Ka...@gmail.com> on 2016/01/27 17:28:59 UTC, 6 replies.
- spark streaming web ui not showing the events - direct kafka api - posted by vimal dinakaran <vi...@gmail.com> on 2016/01/27 18:14:56 UTC, 1 replies.
- Hive on Spark knobs - posted by Ruslan Dautkhanov <da...@gmail.com> on 2016/01/27 18:51:43 UTC, 2 replies.
- Using Spark in mixed Java/Scala project - posted by jeremycod <zo...@gmail.com> on 2016/01/27 19:07:47 UTC, 2 replies.
- Python UDFs - posted by Stefan Panayotov <sp...@msn.com> on 2016/01/27 19:38:07 UTC, 2 replies.
- Online Learning for MLLib Forest Ensembles - posted by Scott Imig <si...@richrelevance.com> on 2016/01/27 21:53:08 UTC, 0 replies.
- Re: hivethriftserver2 problems on upgrade to 1.6.0 - posted by Deenar Toraskar <de...@gmail.com> on 2016/01/27 22:42:35 UTC, 0 replies.
- Escaping tabs and newlines not working - posted by Harshvardhan Chauhan <ha...@gumgum.com> on 2016/01/27 22:57:03 UTC, 1 replies.
- Is spark-ec2 going away? - posted by Sung Hwan Chung <co...@cs.stanford.edu> on 2016/01/27 23:07:11 UTC, 5 replies.
- Spark streaming flow control and back pressure - posted by Lin Zhao <li...@exabeam.com> on 2016/01/28 02:28:22 UTC, 3 replies.
- Maintain state outside rdd - posted by Krishna <re...@gmail.com> on 2016/01/28 03:03:11 UTC, 6 replies.
- corresponding sql for query against LocalRelation - posted by ey-chih chow <ey...@hotmail.com> on 2016/01/28 03:18:47 UTC, 1 replies.
- How data locality is honored when spark is running on yarn - posted by Todd <bi...@163.com> on 2016/01/28 03:50:50 UTC, 1 replies.
- GraphX can show graph? - posted by "Balachandar R.A." <ba...@gmail.com> on 2016/01/28 08:12:37 UTC, 3 replies.
- Compile error when compiling spark 2.0.0 snapshot code base in IDEA - posted by Todd <bi...@163.com> on 2016/01/28 08:31:24 UTC, 0 replies.
- can't find trackStateByKey in 1.6.0 jar? - posted by Sebastian Piu <se...@gmail.com> on 2016/01/28 10:51:24 UTC, 2 replies.
- Stream S3 server to Cassandra - posted by Sateesh Karuturi <sa...@gmail.com> on 2016/01/28 10:56:10 UTC, 1 replies.
- “java.io.IOException: Class not found” on long running Streaming application - posted by Patrick McGloin <mc...@gmail.com> on 2016/01/28 11:26:14 UTC, 0 replies.
- Explaination for info shown in UI - posted by Sachin Aggarwal <di...@gmail.com> on 2016/01/28 12:00:10 UTC, 0 replies.
- Why Spark-sql miss TableScanDesc.FILTER_EXPR_CONF_STR params when I move Hive table to Spark? - posted by 开心延年 <mu...@qq.com> on 2016/01/28 12:27:19 UTC, 0 replies.
- 回复：Why Spark-sql miss TableScanDesc.FILTER_EXPR_CONF_STR params when I move Hive table to Spark? - posted by 开心延年 <mu...@qq.com> on 2016/01/28 13:28:42 UTC, 1 replies.
- Spark Distribution of Small Dataset - posted by Philip Lee <ph...@gmail.com> on 2016/01/28 13:41:49 UTC, 1 replies.
- spark-xml data source (com.databricks.spark.xml) not working with spark 1.6 - posted by Deenar Toraskar <de...@gmail.com> on 2016/01/28 16:27:06 UTC, 1 replies.
- Re: Tips for Spark's Random Forest slow performance - posted by Alexander Ratnikov <ra...@gmail.com> on 2016/01/28 17:16:59 UTC, 0 replies.
- Understanding Spark Task failures - posted by Patrick McGloin <mc...@gmail.com> on 2016/01/28 17:51:22 UTC, 2 replies.
- Parquet block size from spark-sql cli - posted by ubet <ul...@sonra.io> on 2016/01/28 18:16:44 UTC, 1 replies.
- Setting up data for columnsimilarity - posted by rcollich <rc...@gmail.com> on 2016/01/28 19:00:07 UTC, 0 replies.
- Streaming: LeaseExpiredException when writing checkpoint - posted by Lin Zhao <li...@exabeam.com> on 2016/01/28 19:42:04 UTC, 0 replies.
- streaming in 1.6.0 slower than 1.5.1 - posted by Jesse F Chen <jf...@us.ibm.com> on 2016/01/28 20:49:14 UTC, 3 replies.
- Broadcast join on multiple dataframes - posted by Srikanth <sr...@gmail.com> on 2016/01/28 21:26:57 UTC, 2 replies.
- How to write a custom window function? - posted by Benyi Wang <be...@gmail.com> on 2016/01/28 21:49:48 UTC, 1 replies.
- Problems when applying scheme to RDD - posted by Andrés Ivaldi <ia...@gmail.com> on 2016/01/28 21:59:57 UTC, 0 replies.
- local class incompatible: stream classdesc serialVersionUID - posted by Jason Plurad <pl...@gmail.com> on 2016/01/28 22:38:44 UTC, 3 replies.
- Spark Caching Kafka Metadata - posted by asdf zxcv <be...@gmail.com> on 2016/01/29 01:07:43 UTC, 1 replies.
- Data not getting printed in Spark Streaming with print(). - posted by satyajit vegesna <sa...@gmail.com> on 2016/01/29 01:22:42 UTC, 1 replies.
- Getting Exceptions/WARN during random runs for same dataset - posted by Khusro Siddiqui <mk...@gmail.com> on 2016/01/29 02:13:14 UTC, 4 replies.
- building spark 1.6.0 fails - posted by "Carlile, Ken" <ca...@janelia.hhmi.org> on 2016/01/29 02:24:55 UTC, 2 replies.
- How to filter the isolated vertexes in Graphx - posted by "Zhang, Jingyu" <ji...@news.com.au> on 2016/01/29 02:49:07 UTC, 0 replies.
- Programmatically launching spark on yarn-client mode no longer works in spark 1.5.2 - posted by Nirav Patel <np...@xactlycorp.com> on 2016/01/29 03:22:34 UTC, 3 replies.
- looking for an easy way to count number of rows in JavaDStream - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/01/29 03:41:35 UTC, 1 replies.
- Spark 1.5.2 - Programmatically launching spark on yarn-client mode - posted by Nirav Patel <np...@xactlycorp.com> on 2016/01/29 04:36:44 UTC, 2 replies.
- Persisting of DataFrames in transformation workflows - posted by Gireesh Puthumana <gi...@augmentiq.in> on 2016/01/29 05:10:02 UTC, 0 replies.
- Spark streaming and ThreadLocal - posted by N B <nb...@gmail.com> on 2016/01/29 06:31:47 UTC, 8 replies.
- Visualization of KMeans cluster in Spark - posted by Yogesh Vyas <in...@gmail.com> on 2016/01/29 06:44:47 UTC, 0 replies.
- Repartition taking place for all previous windows even after checkpointing - posted by Abhishek Anand <ab...@gmail.com> on 2016/01/29 08:38:19 UTC, 0 replies.
- mapWithState / stateSnapshots() yielding empty rdds? - posted by Sebastian Piu <se...@gmail.com> on 2016/01/29 10:36:52 UTC, 1 replies.
- Number of batches in the Streaming Statics visualization screen - posted by Mehdi Ben Haj Abbes <me...@gmail.com> on 2016/01/29 10:45:52 UTC, 3 replies.
- Spark Algorithms as WEB Application - posted by rahulganesh <dr...@gmail.com> on 2016/01/29 11:44:14 UTC, 1 replies.
- Pyspark filter not empty - posted by patcharee <Pa...@uni.no> on 2016/01/29 16:33:05 UTC, 0 replies.
- Spark Streaming from existing RDD - posted by Sateesh Karuturi <sa...@gmail.com> on 2016/01/29 16:35:56 UTC, 1 replies.
- mapWithState: multiple operations on the same stream - posted by Udo Fholl <ud...@gmail.com> on 2016/01/29 19:40:05 UTC, 0 replies.
- mapWithState: remove key - posted by Udo Fholl <ud...@gmail.com> on 2016/01/29 19:45:36 UTC, 1 replies.
- How to control the number of files for dynamic partition in Spark SQL? - posted by Benyi Wang <be...@gmail.com> on 2016/01/29 21:26:06 UTC, 1 replies.
- How to use DStream reparation() ? - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/01/29 22:54:41 UTC, 1 replies.
- GoogleAnalytics GAData - posted by Andrés Ivaldi <ia...@gmail.com> on 2016/01/29 23:03:52 UTC, 0 replies.
- [MLlib] What is the best way to forecast the next month page visit? - posted by diplomatic Guru <di...@gmail.com> on 2016/01/29 23:31:35 UTC, 0 replies.
- saveAsTextFile is not writing to local fs - posted by Siva <sb...@gmail.com> on 2016/01/30 00:38:29 UTC, 2 replies.
- Reading lzo+index with spark-csv (Splittable reads) - posted by syepes <sy...@gmail.com> on 2016/01/30 01:43:54 UTC, 1 replies.
- Garbage collections issue on MapPartitions - posted by rcollich <rc...@gmail.com> on 2016/01/30 01:53:24 UTC, 0 replies.
- Reading multiple avro files from a dir - Spark 1.5.1 - posted by Ajinkya Kale <ka...@gmail.com> on 2016/01/30 02:18:45 UTC, 0 replies.
- Re: stopping spark stream app - posted by agateaaa <ag...@gmail.com> on 2016/01/30 02:36:03 UTC, 0 replies.
- can't kill spark job in supervise mode - posted by PhuDuc Nguyen <du...@gmail.com> on 2016/01/30 17:19:50 UTC, 2 replies.
- deep learning with heterogeneous cloud computing using spark - posted by Abid Malik <ab...@gmail.com> on 2016/01/30 18:20:39 UTC, 2 replies.
- Product similarity with TF/IDF and Cosine similarity (DIMSUM) - posted by Alan Prando <al...@gmail.com> on 2016/01/30 22:29:58 UTC, 0 replies.