You are viewing a plain text version of this content. The canonical link for it is here.
- Re: Parallel execution of RDDs - posted by Brian Parker <as...@gmail.com> on 2015/09/01 00:43:55 UTC, 0 replies.
- bulk upload to Elasticsearch and shuffle behavior - posted by Eric Walker <er...@gmail.com> on 2015/09/01 01:09:37 UTC, 2 replies.
- Re: Problems with Tungsten in Spark 1.5.0-rc2 - posted by Davies Liu <da...@databricks.com> on 2015/09/01 01:38:36 UTC, 5 replies.
- Re: Where is the doc about the spark rest api ? - posted by canan chen <cc...@gmail.com> on 2015/09/01 02:11:06 UTC, 0 replies.
- Connection closed error while running Terasort - posted by Suman Somasundar <su...@oracle.com> on 2015/09/01 02:13:14 UTC, 1 replies.
- Group by specific key and save as parquet - posted by gtinside <gt...@gmail.com> on 2015/09/01 02:27:33 UTC, 1 replies.
- Re: [MLlib] DIMSUM row similarity? - posted by Brian Parker <as...@gmail.com> on 2015/09/01 02:52:19 UTC, 0 replies.
- Re: Spark Effects of Driver Memory, Executor Memory, Driver Memory Overhead and Executor Memory Overhead on success of job runs - posted by Timothy Sum Hon Mun <ti...@gmail.com> on 2015/09/01 04:23:36 UTC, 3 replies.
- Re: spark-submit issue - posted by Pranay Tonpay <pr...@impetus.co.in> on 2015/09/01 07:14:22 UTC, 0 replies.
- Submitted applications does not run. - posted by Madawa Soysa <ma...@cse.mrt.ac.lk> on 2015/09/01 08:17:52 UTC, 11 replies.
- spark 1.5 sort slow - posted by patcharee <Pa...@uni.no> on 2015/09/01 10:06:53 UTC, 1 replies.
- Re: Spark executor OOM issue on YARN - posted by ponkin <al...@ya.ru> on 2015/09/01 10:36:04 UTC, 0 replies.
- Re: Spark shell and StackOverFlowError - posted by ponkin <al...@ya.ru> on 2015/09/01 11:06:51 UTC, 0 replies.
- Re: How to effieciently write sorted neighborhood in pyspark - posted by shahid qadri <sh...@icloud.com> on 2015/09/01 11:13:58 UTC, 0 replies.
- Custom Partitioner - posted by shahid qadri <sh...@icloud.com> on 2015/09/01 11:14:48 UTC, 8 replies.
- How to determine the value for spark.sql.shuffle.partitions? - posted by Romi Kuntsman <ro...@totango.com> on 2015/09/01 11:17:49 UTC, 2 replies.
- Re: How to compute the probability of each class in Naive Bayes - posted by Yanbo Liang <yb...@gmail.com> on 2015/09/01 11:48:31 UTC, 5 replies.
- HiveThriftServer not registering with Zookeeper - posted by sreeramvenkat <sr...@infosys.com> on 2015/09/01 12:10:37 UTC, 0 replies.
- Re: Is it possible to create spark cluster in different network? - posted by Akhil Das <ak...@sigmoidanalytics.com> on 2015/09/01 12:47:23 UTC, 1 replies.
- How mature is spark sql - posted by rakesh sharma <ra...@hotmail.com> on 2015/09/01 13:07:30 UTC, 1 replies.
- Schema From parquet file - posted by Hafiz Mujadid <ha...@gmail.com> on 2015/09/01 13:07:50 UTC, 1 replies.
- Error using spark.driver.userClassPathFirst=true - posted by cgalan <cg...@gmail.com> on 2015/09/01 13:28:58 UTC, 1 replies.
- reading multiple parquet file using spark sql - posted by Hafiz Mujadid <ha...@gmail.com> on 2015/09/01 13:31:41 UTC, 1 replies.
- Spark job killed - posted by Silvio Bernardinello <sb...@beintoo.com> on 2015/09/01 14:26:43 UTC, 1 replies.
- Re: Potential NPE while exiting spark-shell - posted by nasokan <an...@gmail.com> on 2015/09/01 14:52:00 UTC, 0 replies.
- spark streaming 1.3 with kafka - posted by Shushant Arora <sh...@gmail.com> on 2015/09/01 15:02:41 UTC, 8 replies.
- Re: Memory-efficient successive calls to repartition() - posted by Aurélien Bellet <au...@telecom-paristech.fr> on 2015/09/01 16:48:43 UTC, 6 replies.
- Web UI is not showing up - posted by Sunil Rathee <ra...@gmail.com> on 2015/09/01 17:26:44 UTC, 3 replies.
- Question about Google Books Ngrams with pyspark (1.4.1) - posted by Bertrand <be...@gmail.com> on 2015/09/01 17:39:20 UTC, 4 replies.
- What should be the optimal value for spark.sql.shuffle.partition? - posted by unk1102 <um...@gmail.com> on 2015/09/01 18:11:01 UTC, 2 replies.
- cached data between jobs - posted by Eric Walker <er...@gmail.com> on 2015/09/01 18:53:51 UTC, 2 replies.
- Error when creating an ALS model in spark - posted by Madawa Soysa <ma...@cse.mrt.ac.lk> on 2015/09/01 18:55:24 UTC, 0 replies.
- Intermittent performance degradation in Spark Streaming - posted by Michael Siler <mi...@gmail.com> on 2015/09/01 19:26:09 UTC, 0 replies.
- Resource allocation in SPARK streaming - posted by anshu shukla <an...@gmail.com> on 2015/09/01 19:55:01 UTC, 1 replies.
- Re: How to avoid shuffle errors for a large join ? - posted by Thomas Dudziak <to...@gmail.com> on 2015/09/01 20:13:28 UTC, 3 replies.
- What is the current status of ML ? - posted by Sa...@wellsfargo.com on 2015/09/01 20:23:06 UTC, 2 replies.
- Executor lost failure - posted by Priya Ch <le...@gmail.com> on 2015/09/01 20:28:51 UTC, 1 replies.
- Re: Reading xml in java using spark - posted by Darin McBeath <dd...@yahoo.com.INVALID> on 2015/09/01 20:58:12 UTC, 0 replies.
- Conditionally do things different on the first minibatch vs subsequent minibatches in a dstream - posted by steve_ash <st...@gmail.com> on 2015/09/01 21:36:26 UTC, 1 replies.
- Re: Hung spark executors don't count toward worker memory limit - posted by hai <ha...@evertrue.com> on 2015/09/01 23:00:37 UTC, 0 replies.
- extracting file path using dataframes - posted by Matt K <ma...@gmail.com> on 2015/09/02 01:00:58 UTC, 2 replies.
- spark 1.4.1 saveAsTextFile is slow on emr-4.0.0 - posted by Alexander Pivovarov <ap...@gmail.com> on 2015/09/02 01:01:04 UTC, 4 replies.
- Spark + Druid - posted by Harish Butani <rh...@gmail.com> on 2015/09/02 06:04:08 UTC, 3 replies.
- OOM in spark driver - posted by ankit tyagi <an...@gmail.com> on 2015/09/02 08:04:15 UTC, 0 replies.
- Save dataframe into hbase - posted by Hafiz Mujadid <ha...@gmail.com> on 2015/09/02 10:04:17 UTC, 2 replies.
- Re: Too many open files issue - posted by Steve Loughran <st...@hortonworks.com> on 2015/09/02 10:33:14 UTC, 4 replies.
- Multiple spark-submits vs akka-actors - posted by srungarapu vamsi <sr...@gmail.com> on 2015/09/02 12:32:57 UTC, 2 replies.
- How to Serialize and Reconstruct JavaRDD later? - posted by Raja Reddy <kl...@gmail.com> on 2015/09/02 13:24:51 UTC, 1 replies.
- Unable to understand error “SparkListenerBus has already stopped! Dropping event …” - posted by Adrien Mogenet <ad...@contentsquare.com> on 2015/09/02 14:22:32 UTC, 0 replies.
- Error using SQLContext in spark - posted by rakesh sharma <ra...@hotmail.com> on 2015/09/02 15:11:48 UTC, 1 replies.
- Simple join of two Spark DataFrame failing with “org.apache.spark.sql.AnalysisException: Cannot resolve column name” - posted by "steve.felsheim" <st...@gmail.com> on 2015/09/02 17:11:56 UTC, 0 replies.
- Small File to HDFS - posted by ni...@free.fr on 2015/09/02 18:07:02 UTC, 18 replies.
- Inferring JSON schema from a JSON string in a dataframe column - posted by mstang <mi...@ericsson.com> on 2015/09/02 18:26:57 UTC, 0 replies.
- Spark MLlib Decision Tree Node Accuracy - posted by derechan <de...@visa.com> on 2015/09/02 19:54:43 UTC, 1 replies.
- ERROR WHILE REPARTITION - posted by shahid ashraf <sh...@trialx.com> on 2015/09/02 20:26:43 UTC, 0 replies.
- `sbt core/test` hangs on LogUrlsStandaloneSuite? - posted by Jacek Laskowski <ja...@japila.pl> on 2015/09/02 21:24:26 UTC, 0 replies.
- Understanding Batch Processing Time - posted by Snehal Nagmote <na...@gmail.com> on 2015/09/02 21:55:11 UTC, 3 replies.
- Kafka Direct Stream join without data shuffle - posted by Chen Song <ch...@gmail.com> on 2015/09/02 22:06:01 UTC, 1 replies.
- Spark DataFrame saveAsTable with partitionBy creates no ORC file in HDFS - posted by unk1102 <um...@gmail.com> on 2015/09/02 22:34:34 UTC, 1 replies.
- wild cards in spark sql - posted by Hafiz Mujadid <ha...@gmail.com> on 2015/09/02 22:50:02 UTC, 2 replies.
- large number of import-related function calls in PySpark profile - posted by "Priedhorsky, Reid" <re...@lanl.gov> on 2015/09/02 23:10:21 UTC, 5 replies.
- Is it required to remove checkpoint when submitting a code change? - posted by Ricardo Luis Silva Paiva <ri...@corp.globo.com> on 2015/09/02 23:48:21 UTC, 7 replies.
- Hbase Lookup - posted by ayan guha <gu...@gmail.com> on 2015/09/02 23:53:40 UTC, 7 replies.
- spark-submit not using conf/spark-defaults.conf - posted by Axel Dahl <ax...@whisperstream.com> on 2015/09/03 01:38:46 UTC, 4 replies.
- Unbale to run Group BY on Large File - posted by "SAHA, DEBOBROTA" <ds...@att.com> on 2015/09/03 01:46:39 UTC, 3 replies.
- Parquet partitioning for unique identifier - posted by Kohki Nishio <ta...@gmail.com> on 2015/09/03 02:11:41 UTC, 6 replies.
- Problem while loading saved data - posted by Amila De Silva <ja...@gmail.com> on 2015/09/03 03:25:36 UTC, 5 replies.
- FlatMap Explanation - posted by Ashish Soni <as...@gmail.com> on 2015/09/03 04:05:50 UTC, 3 replies.
- Alter table fails to find table - posted by Tim Smith <se...@gmail.com> on 2015/09/03 06:29:09 UTC, 1 replies.
- Getting an error when trying to read a GZIPPED file - posted by Spark Enthusiast <sp...@yahoo.in> on 2015/09/03 06:41:28 UTC, 1 replies.
- Re: Slow Mongo Read from Spark - posted by Akhil Das <ak...@sigmoidanalytics.com> on 2015/09/03 09:15:09 UTC, 2 replies.
- Re: Managing httpcomponent dependency in Spark/Solr - posted by Akhil Das <ak...@sigmoidanalytics.com> on 2015/09/03 09:22:23 UTC, 1 replies.
- Re: Exceptions in threads in executor code don't get caught properly - posted by Akhil Das <ak...@sigmoidanalytics.com> on 2015/09/03 09:26:55 UTC, 1 replies.
- INDEXEDRDD in PYSPARK - posted by shahid ashraf <sh...@trialx.com> on 2015/09/03 13:36:30 UTC, 0 replies.
- Fwd: Code generation for GPU - posted by kiran lonikar <lo...@gmail.com> on 2015/09/03 13:58:22 UTC, 2 replies.
- LZO-compressed files - posted by Bertrand <be...@gmail.com> on 2015/09/03 14:19:31 UTC, 0 replies.
- RE: spark 1.4.1 saveAsTextFile (and Parquet) is slow on emr-4.0.0 - posted by Ewan Leith <ew...@realitymine.com> on 2015/09/03 14:42:17 UTC, 0 replies.
- Parsing Avro from Kafka Message - posted by Daniel Haviv <da...@veracity-group.com> on 2015/09/03 14:47:26 UTC, 2 replies.
- Tuning - tasks per core - posted by Hans van den Bogert <ha...@gmail.com> on 2015/09/03 14:56:26 UTC, 1 replies.
- Re: Input size increasing every iteration of gradient boosted trees [1.4] - posted by Peter Rudenko <pe...@gmail.com> on 2015/09/03 14:56:32 UTC, 0 replies.
- Re: Input size increasing every iteration of gradient boosted trees [1.4] - posted by Sean Owen <so...@cloudera.com> on 2015/09/03 15:53:17 UTC, 0 replies.
- How to Take the whole file as a partition - posted by Shuai Zheng <sz...@gmail.com> on 2015/09/03 16:22:12 UTC, 1 replies.
- Batchdurationmillis seems "sticky" with direct Spark streaming - posted by Dmitry Goldenberg <dg...@gmail.com> on 2015/09/03 16:34:44 UTC, 23 replies.
- pySpark window functions are not working in the same way as Spark/Scala ones - posted by Sergey Shcherbakov <se...@gmail.com> on 2015/09/03 16:41:05 UTC, 1 replies.
- spark-csv package - output to filename.csv? - posted by Ewan Leith <ew...@realitymine.com> on 2015/09/03 17:04:39 UTC, 0 replies.
- Re: How to Take the whole file as a partition - posted by Tao Lu <ta...@gmail.com> on 2015/09/03 17:06:54 UTC, 1 replies.
- NOT IN in Spark SQL - posted by Pietro Gentile <pi...@gmail.com> on 2015/09/03 17:16:17 UTC, 2 replies.
- Resource allocation issue - is it possible to submit a new job in existing application under a different user? - posted by Dhaval Patel <dh...@gmail.com> on 2015/09/03 18:03:58 UTC, 2 replies.
- spark.shuffle.spill=false ignored? - posted by Eric Walker <er...@gmail.com> on 2015/09/03 18:56:55 UTC, 2 replies.
- VaryMax Rotation and other questions for PCA in Spark MLLIB - posted by Behzad Altaf <be...@gmail.com> on 2015/09/03 19:28:08 UTC, 0 replies.
- Ranger-like Security on Spark - posted by Daniel Schulz <da...@hotmail.com> on 2015/09/03 19:37:01 UTC, 7 replies.
- Re: spark 1.4.1 - LZFException - posted by Yadid Ayzenberg <ya...@media.mit.edu> on 2015/09/03 20:25:16 UTC, 1 replies.
- SparkSQL without access to arrays? - posted by Terry <th...@gmail.com> on 2015/09/03 20:28:29 UTC, 0 replies.
- Spark partitions from CassandraRDD - posted by "Alaa Zubaidi (PDF)" <al...@pdf.com> on 2015/09/03 20:54:48 UTC, 3 replies.
- Re: How to avoid executor time out on yarn spark while dealing with large shuffle skewed data? - posted by Umesh Kacha <um...@gmail.com> on 2015/09/03 22:43:40 UTC, 2 replies.
- spark-shell does not see conf folder content on emr-4 - posted by Alexander Pivovarov <ap...@gmail.com> on 2015/09/03 23:10:30 UTC, 1 replies.
- Re: Running Examples - posted by delbert <de...@outlook.com> on 2015/09/04 01:25:10 UTC, 0 replies.
- Does Spark.ml LogisticRegression assumes only Double valued features? - posted by njoshi <ni...@teamaol.com> on 2015/09/04 01:41:34 UTC, 1 replies.
- different Row objects? - posted by Wei Chen <we...@gmail.com> on 2015/09/04 01:45:03 UTC, 1 replies.
- DataFrame creation delay? - posted by Isabelle Phan <nl...@gmail.com> on 2015/09/04 05:17:02 UTC, 4 replies.
- repartition on direct kafka stream - posted by Shushant Arora <sh...@gmail.com> on 2015/09/04 06:42:26 UTC, 4 replies.
- Drools integration with Spark - posted by Shiva Moorthy <ps...@gmail.com> on 2015/09/04 08:38:14 UTC, 0 replies.
- Drools and Spark Integration - Need Help - posted by Shiva moorthy <ps...@gmail.com> on 2015/09/04 08:41:11 UTC, 2 replies.
- Spark Streaming - Small file in HDFS - posted by Pravesh Jain <pr...@gmail.com> on 2015/09/04 10:22:02 UTC, 0 replies.
- Partitions with zero records & variable task times - posted by mark <ma...@googlemail.com> on 2015/09/04 10:46:24 UTC, 5 replies.
- Output files of saveAsText are getting stuck in temporary directory - posted by Chirag Dewan <ch...@ericsson.com> on 2015/09/04 11:10:03 UTC, 3 replies.
- [spark-streaming] New directStream API reads topic's partitions sequentially. Why? - posted by ponkin <al...@ya.ru> on 2015/09/04 12:17:59 UTC, 2 replies.
- How do we get the Spark Streaming logs while it is active? - posted by Uthayan Suthakar <ut...@gmail.com> on 2015/09/04 13:46:31 UTC, 0 replies.
- Python Spark Streaming example with textFileStream does not work. Why? - posted by Kamilbek <ka...@gmail.com> on 2015/09/04 14:15:18 UTC, 4 replies.
- SparkR / MLlib Integration - posted by Jonathan Hodges <ho...@gmail.com> on 2015/09/04 14:50:15 UTC, 1 replies.
- New to Spark - Paritioning Question - posted by mmike87 <mw...@snl.com> on 2015/09/04 17:06:11 UTC, 3 replies.
- ClassCastException in driver program - posted by Jeff Jones <jj...@adaptivebiotech.com> on 2015/09/04 18:26:08 UTC, 3 replies.
- Why is huge data shuffling in Spark when using union()/coalesce(1,false) on DataFrame? - posted by unk1102 <um...@gmail.com> on 2015/09/04 18:29:45 UTC, 2 replies.
- Spark on Yarn vs Standalone - posted by Alexander Pivovarov <ap...@gmail.com> on 2015/09/04 22:24:24 UTC, 9 replies.
- Is HDFS required for Spark streaming? - posted by N B <nb...@gmail.com> on 2015/09/04 23:45:44 UTC, 6 replies.
- What happens to this RDD? OutOfMemoryError - posted by Kevin Mandich <km...@agari.com.INVALID> on 2015/09/04 23:53:59 UTC, 0 replies.
- Help! Problem of UnsatisfiedLinkError with Spark JDBC JNI dynamic library - posted by Jonathan Yue <jy...@yahoo.com.INVALID> on 2015/09/05 00:32:23 UTC, 0 replies.
- Can we gracefully kill stragglers in Spark SQL - posted by Jia Zhan <zh...@gmail.com> on 2015/09/05 02:25:37 UTC, 0 replies.
- SparkContext initialization error- java.io.IOException: No space left on device - posted by shenyan zhen <sh...@gmail.com> on 2015/09/05 03:50:04 UTC, 3 replies.
- Failing to include multiple JDBC drivers - posted by Nicholas Connor <ni...@gmail.com> on 2015/09/05 05:59:54 UTC, 1 replies.
- Exception in saving MatrixFactorizationModel - posted by Madawa Soysa <ma...@cse.mrt.ac.lk> on 2015/09/05 07:21:44 UTC, 0 replies.
- how to design the Spark application so that Shuffle data will be automatically cleaned up after some iterations - posted by Jun Li <jl...@gmail.com> on 2015/09/05 10:23:49 UTC, 0 replies.
- Problem with repartition/OOM - posted by Yana Kadiyska <ya...@gmail.com> on 2015/09/05 12:59:19 UTC, 2 replies.
- Problem to persist Hibernate entity from Spark job - posted by Zoran Jeremic <zo...@gmail.com> on 2015/09/06 06:11:48 UTC, 4 replies.
- Meets "java.lang.IllegalArgumentException" when test spark ml pipe with DecisionTreeClassifier - posted by Terry Hole <hu...@gmail.com> on 2015/09/06 06:47:02 UTC, 6 replies.
- Spark - launchng job for each action - posted by Priya Ch <le...@gmail.com> on 2015/09/06 08:33:32 UTC, 4 replies.
- udaf with multiple return values in spark 1.5.0 - posted by Simon Hafner <re...@gmail.com> on 2015/09/06 11:55:36 UTC, 0 replies.
- [streaming] Using org.apache.spark.Logging will silently break task execution - posted by Alexey Ponkin <al...@ya.ru> on 2015/09/06 22:53:04 UTC, 0 replies.
- Re: [streaming] Using org.apache.spark.Logging will silently break task execution - posted by Gerard Maas <ge...@gmail.com> on 2015/09/07 00:43:38 UTC, 1 replies.
- hadoop2.6.0 + spark1.4.1 + python2.7.10 - posted by Sasha Kacanski <sk...@gmail.com> on 2015/09/07 01:17:45 UTC, 5 replies.
- buildSupportsSnappy exception when reading the snappy file in Spark - posted by "dong.yajun" <do...@gmail.com> on 2015/09/07 04:11:35 UTC, 2 replies.
- Spark SQL - UDF for scoring a model - take $"*" - posted by Night Wolf <ni...@gmail.com> on 2015/09/07 08:35:48 UTC, 5 replies.
- Java UDFs in GROUP BY expressions - posted by James Aley <ja...@swiftkey.com> on 2015/09/07 10:13:28 UTC, 0 replies.
- DataFrames in Spark - Performance when interjected with RDDs - posted by Pallavi Rao <pa...@inmobi.com> on 2015/09/07 10:47:55 UTC, 0 replies.
- Adding additional jars to distributed cache (yarn-client) - posted by Srikanth Sundarrajan <sr...@hotmail.com> on 2015/09/07 11:54:54 UTC, 0 replies.
- Exception when restoring spark streaming with batch RDD from checkpoint. - posted by ZhengHanbin <ha...@163.com> on 2015/09/07 12:04:42 UTC, 2 replies.
- Spark checkpoining error when joining static dataset with DStream - posted by vermaRaj90 <ra...@gmail.com> on 2015/09/07 12:53:20 UTC, 0 replies.
- Zeppelin + Spark on EMR - posted by shahab <sh...@gmail.com> on 2015/09/07 12:54:32 UTC, 0 replies.
- Sending yarn application logs to web socket - posted by Jeetendra Gangele <ga...@gmail.com> on 2015/09/07 14:23:21 UTC, 4 replies.
- Shared data between algorithms - posted by Somabha Bhattacharjya <bh...@gmail.com> on 2015/09/07 15:10:50 UTC, 0 replies.
- OutOfMemory error with Spark ML 1.5 logreg example - posted by Zoltán Tóth <zo...@gmail.com> on 2015/09/07 15:27:02 UTC, 6 replies.
- Access a Broadcast variable causes Spark to launch a second context - posted by sstraub <ss...@avantgarde-labs.de> on 2015/09/07 15:58:43 UTC, 1 replies.
- Spark ANN - posted by Ruslan Dautkhanov <da...@gmail.com> on 2015/09/07 20:18:03 UTC, 12 replies.
- Spark 1.4 RDD to DF fails with toDF() - posted by Gheorghe Postelnicu <gh...@gmail.com> on 2015/09/07 22:48:00 UTC, 7 replies.
- Parquet Array Support Broken? - posted by Alex Kozlov <al...@gmail.com> on 2015/09/07 22:56:51 UTC, 6 replies.
- Spark summit Asia - posted by Kevin Jung <it...@samsung.com> on 2015/09/08 02:35:01 UTC, 1 replies.
- Support of other languages? - posted by Rahul Palamuttam <ra...@gmail.com> on 2015/09/08 04:54:02 UTC, 4 replies.
- Spark on Yarn: Kryo throws ClassNotFoundException for class included in fat jar - posted by "Nicholas R. Peterson" <nr...@gmail.com> on 2015/09/08 05:38:46 UTC, 10 replies.
- Split content into multiple Parquet files - posted by Adrien Mogenet <ad...@contentsquare.com> on 2015/09/08 08:34:41 UTC, 2 replies.
- Can not allocate executor when running spark on mesos - posted by canan chen <cc...@gmail.com> on 2015/09/08 09:24:59 UTC, 4 replies.
- about mr-style merge sort - posted by 周千昊 <qh...@apache.org> on 2015/09/08 10:46:50 UTC, 5 replies.
- Applying transformations on a JavaRDD using reflection - posted by Nirmal Fernando <ni...@wso2.com> on 2015/09/08 12:07:06 UTC, 2 replies.
- Re: How to read files from S3 from Spark local when there is a http proxy - posted by tariq <do...@gmail.com> on 2015/09/08 14:51:51 UTC, 2 replies.
- 1.5 Build Errors - posted by Benjamin Zaitlen <qu...@gmail.com> on 2015/09/08 14:53:12 UTC, 8 replies.
- Spark intermittently fails to recover from a worker failure (in standalone mode) - posted by Cheuk Lam <ch...@hotmail.com> on 2015/09/08 15:16:25 UTC, 0 replies.
- [streaming] DStream with window performance issue - posted by Alexey Ponkin <al...@ya.ru> on 2015/09/08 15:18:54 UTC, 0 replies.
- No auto decompress in Spark Java textFile function? - posted by Chris Teoh <ch...@gmail.com> on 2015/09/08 15:19:09 UTC, 2 replies.
- Getting Started with Spark - posted by Bryan Jeffrey <br...@gmail.com> on 2015/09/08 15:46:40 UTC, 1 replies.
- Java vs. Scala for Spark - posted by Bryan Jeffrey <br...@gmail.com> on 2015/09/08 15:50:35 UTC, 9 replies.
- Re: [streaming] DStream with window performance issue - posted by Cody Koeninger <co...@koeninger.org> on 2015/09/08 16:03:50 UTC, 9 replies.
- Partitioning a RDD for training multiple classifiers - posted by Maximo Gurmendez <mg...@dataxu.com> on 2015/09/08 16:47:01 UTC, 3 replies.
- Compress JSON dataframes - posted by Sa...@wellsfargo.com on 2015/09/08 17:54:48 UTC, 0 replies.
- Spark with proxy - posted by Mohammad Tariq <do...@gmail.com> on 2015/09/08 18:15:11 UTC, 0 replies.
- Different Kafka createDirectStream implementations - posted by Dan Dutrow <da...@gmail.com> on 2015/09/08 18:44:55 UTC, 4 replies.
- Best way to import data from Oracle to Spark? - posted by Cui Lin <ic...@gmail.com> on 2015/09/08 19:11:00 UTC, 7 replies.
- Can Spark Provide Multiple Context Support? - posted by Rachana Srivastava <Ra...@markmonitor.com> on 2015/09/08 19:12:29 UTC, 1 replies.
- foreachRDD causing executor lost failure - posted by Priya Ch <le...@gmail.com> on 2015/09/08 19:15:11 UTC, 1 replies.
- Creating Dataframe from XML parsed by scalaxb - posted by Christopher Matta <cm...@mapr.com> on 2015/09/08 19:17:18 UTC, 0 replies.
- NPE while reading ORC file using Spark 1.4 API - posted by unk1102 <um...@gmail.com> on 2015/09/08 20:39:05 UTC, 2 replies.
- read compressed hdfs files using SparkContext.textFile? - posted by shenyan zhen <sh...@gmail.com> on 2015/09/08 21:13:23 UTC, 1 replies.
- performance when checking if data frame is empty or not - posted by Axel Dahl <ax...@whisperstream.com> on 2015/09/08 22:22:26 UTC, 1 replies.
- Event logging not working when worker machine terminated - posted by David Rosenstrauch <da...@darose.net> on 2015/09/09 05:15:11 UTC, 5 replies.
- Contribution in Apche Spark - posted by Chintan Bhatt <ch...@charusat.ac.in> on 2015/09/09 06:20:03 UTC, 1 replies.
- Task serialization error for mllib.MovieLensALS - posted by Jeff Zhang <zj...@gmail.com> on 2015/09/09 08:14:11 UTC, 0 replies.
- How to read compressed parquet file - posted by 李铖 <li...@gmail.com> on 2015/09/09 09:29:16 UTC, 2 replies.
- java.lang.NoSuchMethodError and yarn-client mode - posted by Tom Seddon <mr...@gmail.com> on 2015/09/09 10:41:17 UTC, 3 replies.
- [ANNOUNCE] Announcing Spark 1.5.0 - posted by Reynold Xin <rx...@databricks.com> on 2015/09/09 11:47:30 UTC, 0 replies.
- I am very new to Spark. I have a very basic question. I have an array of values: listofECtokens: Array[String] = Array(EC-17A5206955089011B, EC-17A5206955089011A) I want to filter an RDD for all of these token values. I tried the following way: val ECtokens = for (token <- listofECtokens) rddAll.filter(line => line.contains(token)) Output: ECtokens: Unit = () I got an empty Unit even when there are records with these tokens. What am I doing wrong? - posted by prachicsa <pr...@gmail.com> on 2015/09/09 11:55:04 UTC, 0 replies.
- Filtering records for all values of an array in Spark - posted by prachicsa <pr...@gmail.com> on 2015/09/09 11:58:36 UTC, 0 replies.
- I want to know the parition result in each node - posted by szy <sz...@hotmail.com> on 2015/09/09 12:02:36 UTC, 0 replies.
- Re: I am very new to Spark. I have a very basic question. I have an array of values: listofECtokens: Array[String] = Array(EC-17A5206955089011B, EC-17A5206955089011A) I want to filter an RDD for all of these token values. I tried the following way: val ECtokens = for (token <- listofECtokens) rddAll.filter(line => line.contains(token)) Output: ECtokens: Unit = () I got an empty Unit even when there are records with these tokens. What am I doing wrong? - posted by Akhil Das <ak...@sigmoidanalytics.com> on 2015/09/09 12:13:27 UTC, 1 replies.
- bad substitution for [hdp.version] Error in spark on YARN job - posted by Jeetendra Gangele <ga...@gmail.com> on 2015/09/09 14:14:08 UTC, 1 replies.
- Help getting Spark JDBC metadata - posted by Tom Barber <to...@meteorite.bi> on 2015/09/09 14:17:20 UTC, 0 replies.
- Re: long running Spark Streaming job and eventlog files - posted by jarod7736 <ja...@gmail.com> on 2015/09/09 15:27:53 UTC, 0 replies.
- [Spark on Amazon EMR] : File does not exist: hdfs://ip-x-x-x-x:/.../spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar - posted by shahab <sh...@gmail.com> on 2015/09/09 16:28:20 UTC, 4 replies.
- JNI issues with mesos - posted by Adrian Bridgett <ad...@opensignal.com> on 2015/09/09 16:59:05 UTC, 3 replies.
- Loading json data into Pair RDD in Spark using java - posted by prachicsa <pr...@gmail.com> on 2015/09/09 17:50:25 UTC, 1 replies.
- spark history server + yarn log aggregation issue - posted by mi...@nomura.com on 2015/09/09 17:57:45 UTC, 0 replies.
- spark.kryo.registrationRequired: Tuple2 is not registered - posted by Marius Soutier <mp...@gmail.com> on 2015/09/09 18:00:22 UTC, 1 replies.
- Driver OOM after upgrading to 1.5 - posted by Sandy Ryza <sa...@cloudera.com> on 2015/09/09 19:10:38 UTC, 4 replies.
- Spark Streaming checkpoints and code upgrade - posted by Nicolas Monchy <ni...@gumgum.com> on 2015/09/09 19:19:05 UTC, 2 replies.
- Cores per executors - posted by Thomas Gerber <th...@radius.com> on 2015/09/09 19:56:15 UTC, 0 replies.
- Problems with Local Checkpoints - posted by Bryan Jeffrey <br...@gmail.com> on 2015/09/09 20:00:08 UTC, 2 replies.
- Spark UI keep redirecting to /null and returns 500 - posted by Rajeev Prasad <ra...@gmail.com> on 2015/09/09 20:26:58 UTC, 0 replies.
- Spark streaming -> cassandra : Fault Tolerance - posted by Samya <sa...@amadeus.com> on 2015/09/09 21:09:37 UTC, 2 replies.
- Spark rdd.mapPartitionsWithIndex() hits physical memory limit after huge data shuffle - posted by unk1102 <um...@gmail.com> on 2015/09/09 21:37:43 UTC, 0 replies.
- Re: Adding/subtracting org.apache.spark.mllib.linalg.Vector in Scala? - posted by Burak Yavuz <br...@gmail.com> on 2015/09/09 23:53:20 UTC, 1 replies.
- Accumulator with non-java-serializable value ? - posted by Thomas Dudziak <to...@gmail.com> on 2015/09/10 00:18:04 UTC, 0 replies.
- ArrayIndexOutOfBoundsException when using repartitionAndSortWithinPartitions() - posted by Ashish Shenoy <as...@instartlogic.com> on 2015/09/10 01:45:16 UTC, 5 replies.
- Filtering an rdd depending upon a list of values in Spark - posted by prachicsa <pr...@gmail.com> on 2015/09/10 05:04:30 UTC, 2 replies.
- build on spark 1.5.0 error with Execution scala-compile-first of goal & Compile failed via zinc server - posted by stark_summer <st...@qq.com> on 2015/09/10 05:50:37 UTC, 0 replies.
- Failed when starting Spark 1.5.0 standalone cluster - posted by Netwaver <wa...@163.com> on 2015/09/10 06:05:49 UTC, 3 replies.
- Tungsten and Spark Streaming - posted by N B <nb...@gmail.com> on 2015/09/10 06:23:47 UTC, 4 replies.
- Creating Parquet external table using HiveContext API - posted by Mohammad Islam <mi...@yahoo.com.INVALID> on 2015/09/10 06:33:21 UTC, 2 replies.
- Re: build on spark 1.5.0 error with Execution scala-compile-first of goal & Compile failed via zinc server - posted by Ted Yu <yu...@gmail.com> on 2015/09/10 06:44:15 UTC, 0 replies.
- spark streaming 1.3 with kafka connection timeout - posted by Shushant Arora <sh...@gmail.com> on 2015/09/10 07:21:05 UTC, 5 replies.
- How to keep history of streaming statistics - posted by "b.bhavesh" <b....@gmail.com> on 2015/09/10 07:51:28 UTC, 2 replies.
- SparkR - Support for Other Models - posted by Manish MAHESHWARI <ma...@dbs.com> on 2015/09/10 07:56:42 UTC, 1 replies.
- Kr - posted by Huy Banh <hu...@gmail.com> on 2015/09/10 08:06:20 UTC, 0 replies.
- Re: Maintaining Kafka Direct API Offsets - posted by Samya <sa...@amadeus.com> on 2015/09/10 08:21:55 UTC, 1 replies.
- Avoiding SQL Injection in Spark SQL - posted by V Dineshkumar <de...@gmail.com> on 2015/09/10 08:32:42 UTC, 4 replies.
- Cassandra row count grouped by multiple columns - posted by Chirag Dewan <ch...@ericsson.com> on 2015/09/10 09:19:46 UTC, 1 replies.
- Custom UDAF Evaluated Over Window - posted by xander92 <al...@ompnt.com> on 2015/09/10 09:26:36 UTC, 1 replies.
- Re: Perf impact of BlockManager byte[] copies - posted by Reynold Xin <rx...@databricks.com> on 2015/09/10 09:46:20 UTC, 0 replies.
- spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL - posted by Todd <bi...@163.com> on 2015/09/10 10:24:26 UTC, 16 replies.
- Terasort on spark -- Jmeter - posted by Shreeharsha G Neelakantachar <sh...@in.ibm.com> on 2015/09/10 11:25:55 UTC, 0 replies.
- Spark-shell throws Hive error when SQLContext.parquetFile, v1.3 - posted by Petr Novak <os...@gmail.com> on 2015/09/10 12:02:00 UTC, 2 replies.
- Using KafkaDirectStream, stopGracefully and exceptions - posted by Krzysztof Zarzycki <k....@gmail.com> on 2015/09/10 12:02:39 UTC, 10 replies.
- Spark Streaming stop gracefully doesn't return to command line after upgrade to 1.4.0 and beyond - posted by Petr Novak <os...@gmail.com> on 2015/09/10 12:11:44 UTC, 2 replies.
- Random Forest MLlib - posted by Yasemin Kaya <go...@gmail.com> on 2015/09/10 15:09:50 UTC, 3 replies.
- pyspark driver in cluster rather than gateway/client - posted by roy <rp...@njit.edu> on 2015/09/10 15:54:02 UTC, 1 replies.
- How to enable Tungsten in Spark 1.5 for Spark SQL? - posted by unk1102 <um...@gmail.com> on 2015/09/10 16:39:48 UTC, 2 replies.
- Spark task hangs infinitely when accessing S3 - posted by Mario Pastorelli <ma...@teralytics.ch> on 2015/09/10 18:35:19 UTC, 1 replies.
- Spark on Mesos with Jobs in Cluster Mode Documentation - posted by "Tom Waterhouse (tomwater)" <to...@cisco.com> on 2015/09/10 19:13:50 UTC, 13 replies.
- connecting to remote spark and reading files on HDFS or s3 in sparkR - posted by roni <ro...@gmail.com> on 2015/09/10 19:50:37 UTC, 2 replies.
- reading files on HDFS /s3 in sparkR -failing - posted by roni <ro...@gmail.com> on 2015/09/10 21:05:15 UTC, 1 replies.
- java.lang.NullPointerException with Twitter API - posted by Jo Sunad <na...@gmail.com> on 2015/09/10 21:29:54 UTC, 1 replies.
- Sprk RDD : want to combine elements that have approx same keys - posted by prateek arora <pr...@gmail.com> on 2015/09/10 21:34:53 UTC, 1 replies.
- How to restrict java unit tests from the maven command line - posted by Stephen Boesch <ja...@gmail.com> on 2015/09/10 22:39:41 UTC, 2 replies.
- Re: broadcast variable get cleaned by ContextCleaner unexpectedly ? - posted by swetha <sw...@gmail.com> on 2015/09/10 22:58:14 UTC, 0 replies.
- Spark UI keeps redirecting to /null and returns 500 - posted by rajeevpra <ra...@gmail.com> on 2015/09/10 23:44:14 UTC, 1 replies.
- Spark based Kafka Producer - posted by Atul Kulkarni <at...@gmail.com> on 2015/09/11 02:35:39 UTC, 6 replies.
- How to create combine DAG visualization? - posted by "b.bhavesh" <b....@gmail.com> on 2015/09/11 07:42:07 UTC, 0 replies.
- Data lost in spark streaming - posted by Bin Wang <wb...@gmail.com> on 2015/09/11 08:32:40 UTC, 3 replies.
- Multilabel classification support - posted by Yasemin Kaya <go...@gmail.com> on 2015/09/11 09:29:00 UTC, 3 replies.
- sparksql query hive data error - posted by stark_summer <st...@qq.com> on 2015/09/11 09:42:45 UTC, 0 replies.
- RE:RE: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL - posted by prosp4300 <pr...@163.com> on 2015/09/11 10:31:45 UTC, 0 replies.
- Few Conceptual Questions on Spark-SQL and HiveQL - posted by Narayanan K <kn...@gmail.com> on 2015/09/11 10:51:16 UTC, 1 replies.
- MongoDB and Spark - posted by "Mishra, Abhishek" <Ab...@xerox.com> on 2015/09/11 11:01:39 UTC, 5 replies.
- Spark 1.5.0 java.lang.OutOfMemoryError: PermGen space - posted by Jagat Singh <ja...@gmail.com> on 2015/09/11 12:00:18 UTC, 4 replies.
- Exception in Spark-sql insertIntoJDBC command - posted by Baljeet Singh <ba...@gmail.com> on 2015/09/11 12:10:46 UTC, 0 replies.
- selecting columns with the same name in a join - posted by Evert Lammerts <ev...@gmail.com> on 2015/09/11 12:14:01 UTC, 2 replies.
- java.util.NoSuchElementException: key not found - posted by "guoqing0629@yahoo.com.hk" <gu...@yahoo.com.hk> on 2015/09/11 12:35:23 UTC, 2 replies.
- Is there any Spark SQL reference manual? - posted by vivek bhaskar <vi...@gmail.com> on 2015/09/11 12:43:06 UTC, 6 replies.
- Spark does not yet support its JDBC component for Scala 2.11. - posted by Petr Novak <os...@gmail.com> on 2015/09/11 13:53:50 UTC, 3 replies.
- Model summary for linear and logistic regression. - posted by Sebastian Kuepers <se...@publicispixelpark.de> on 2015/09/11 15:25:11 UTC, 1 replies.
- Fwd: MLlib LDA implementation questions - posted by Marko Asplund <ma...@gmail.com> on 2015/09/11 15:29:30 UTC, 2 replies.
- Training the MultilayerPerceptronClassifier - posted by Rory Waite <rw...@sdl.com> on 2015/09/11 15:39:30 UTC, 3 replies.
- Exception Handling : Spark Streaming - posted by Samya <sa...@amadeus.com> on 2015/09/11 16:30:32 UTC, 0 replies.
- Re: Exception Handling : Spark Streaming - posted by Ted Yu <yu...@gmail.com> on 2015/09/11 16:35:25 UTC, 1 replies.
- A way to kill laggard jobs? - posted by Dmitry Goldenberg <dg...@gmail.com> on 2015/09/11 17:11:11 UTC, 0 replies.
- Realtime Data Visualization Tool for Spark - posted by Shashi Vishwakarma <sh...@gmail.com> on 2015/09/11 17:56:50 UTC, 4 replies.
- Help with collect() in Spark Streaming - posted by allonsy <lu...@gmail.com> on 2015/09/11 18:07:43 UTC, 4 replies.
- I'd like to add our company to the Powered by Spark page - posted by Timothy Snyder <ts...@thanxmedia.com> on 2015/09/11 18:19:09 UTC, 0 replies.
- SparkR connection string to Cassandra - posted by Austin Trombley <at...@prosper.com> on 2015/09/11 20:15:54 UTC, 0 replies.
- Spark monitoring - posted by prk77 <pr...@gmail.com> on 2015/09/11 20:16:53 UTC, 1 replies.
- New JavaRDD Inside JavaPairDStream - posted by Rachana Srivastava <Ra...@markmonitor.com> on 2015/09/11 21:09:56 UTC, 1 replies.
- updateStateByKey when the state is very large - posted by "Brush,Ryan" <RB...@CERNER.COM> on 2015/09/11 21:26:50 UTC, 0 replies.
- Error - Calling a package (com.databricks:spark-csv_2.10:1.0.3) with spark-submit - posted by Subhajit Purkayastha <sp...@p3si.net> on 2015/09/11 21:50:14 UTC, 0 replies.
- which install package type for cassandra use - posted by beakesland <be...@gmail.com> on 2015/09/11 21:57:56 UTC, 0 replies.
- UserDefinedTypes - posted by Richard Eggert <ri...@gmail.com> on 2015/09/12 00:00:10 UTC, 0 replies.
- SIGTERM 15 Issue : Spark Streaming for ingesting huge text files using custom Receiver - posted by "Varadhan, Jawahar" <va...@yahoo.com.INVALID> on 2015/09/12 00:02:21 UTC, 0 replies.
- countApproxDistinctByKey in python - posted by LucaMartinetti <lu...@luca.io> on 2015/09/12 01:13:59 UTC, 1 replies.
- Implement "LIKE" in SparkSQL - posted by liam <li...@gmail.com> on 2015/09/12 04:26:18 UTC, 3 replies.
- Multithreaded vs Spark Executor - posted by Rachana Srivastava <Ra...@markmonitor.com> on 2015/09/12 05:07:28 UTC, 1 replies.
- change the spark version - posted by Angel Angel <ar...@gmail.com> on 2015/09/12 08:16:50 UTC, 2 replies.
- Spark K means number of Iterations? - posted by ashensw <as...@wso2.com> on 2015/09/12 09:26:03 UTC, 2 replies.
- Re: SIGTERM 15 Issue : Spark Streaming for ingesting huge text files using custom Receiver - posted by Jörn Franke <jo...@gmail.com> on 2015/09/12 09:32:57 UTC, 0 replies.
- How to create broadcast variable from Java String array? - posted by unk1102 <um...@gmail.com> on 2015/09/12 11:55:52 UTC, 0 replies.
- How do debug YARN client OOM issue? - posted by unk1102 <um...@gmail.com> on 2015/09/12 15:19:58 UTC, 0 replies.
- [Question] ORC - EMRFS Problem - posted by Cazen Lee <ca...@gmail.com> on 2015/09/12 17:15:07 UTC, 5 replies.
- Spark Streaming..Exception - posted by Priya Ch <le...@gmail.com> on 2015/09/12 19:34:33 UTC, 2 replies.
- What is the best way to migrate existing scikit-learn code to PySpark? - posted by Rex X <dn...@gmail.com> on 2015/09/12 20:17:52 UTC, 4 replies.
- UDAF and UDT with SparkSQL 1.5.0 - posted by jussipekkap <jp...@eaglepeaks.com> on 2015/09/12 20:19:34 UTC, 1 replies.
- Why my Spark job is slow and it throws OOM which leads YARN killing executors? - posted by unk1102 <um...@gmail.com> on 2015/09/12 21:52:03 UTC, 2 replies.
- RDD transformation and action running out of memory - posted by Utkarsh Sengar <ut...@gmail.com> on 2015/09/12 23:18:42 UTC, 2 replies.
- What happens when cache is full? - posted by Hemminger Jeff <je...@atware.co.jp> on 2015/09/13 05:14:04 UTC, 1 replies.
- Cogrouping data in dataframes - PairRDD cogroup vs. join - best practices - posted by Matthew Denny <md...@alum.berkeley.edu> on 2015/09/13 06:32:06 UTC, 0 replies.
- Limiting number of cores per job in multi-threaded driver. - posted by Philip Weaver <ph...@gmail.com> on 2015/09/13 07:40:58 UTC, 4 replies.
- Stopping SparkContext and HiveContext - posted by Ophir Cohen <op...@gmail.com> on 2015/09/13 10:48:05 UTC, 2 replies.
- How to Hive UDF in Spark DataFrame? - posted by unk1102 <um...@gmail.com> on 2015/09/13 11:37:26 UTC, 0 replies.
- Parquet partitioning performance issue - posted by sonal sharma <so...@gmail.com> on 2015/09/13 19:54:41 UTC, 1 replies.
- Re: CREATE TABLE ignores database when using PARQUET option - posted by hbogert <ha...@gmail.com> on 2015/09/13 22:50:32 UTC, 0 replies.
- Best way to merge final output part files created by Spark job - posted by unk1102 <um...@gmail.com> on 2015/09/13 23:25:47 UTC, 3 replies.
- Replacing Esper with Spark Streaming? - posted by Otis Gospodnetić <ot...@gmail.com> on 2015/09/14 01:49:40 UTC, 5 replies.
- How to clear Kafka offset in Spark streaming? - posted by Bin Wang <wb...@gmail.com> on 2015/09/14 08:46:28 UTC, 1 replies.
- Interacting with Different Versions of Hive Metastore, how to config? - posted by bg_spark <14...@qq.com> on 2015/09/14 09:01:56 UTC, 0 replies.
- DAG Scheduler deadlock when two RDDs reference each other, force Stages manually? - posted by petranidis <pn...@gmail.com> on 2015/09/14 11:42:18 UTC, 2 replies.
- Spark Streaming Topology - posted by defstat <de...@gmail.com> on 2015/09/14 11:52:20 UTC, 1 replies.
- application failed on large dataset - posted by 周千昊 <qh...@apache.org> on 2015/09/14 15:07:21 UTC, 8 replies.
- Fwd: Spark job failed - posted by Renu Yadav <yr...@gmail.com> on 2015/09/14 15:09:52 UTC, 1 replies.
- Twitter Streming using Twitter Public Streaming API and Apache Spark - posted by Sadaf <sa...@platalytics.com> on 2015/09/14 15:24:13 UTC, 0 replies.
- hdfs-ha on mesos - odd bug - posted by Adrian Bridgett <ad...@opensignal.com> on 2015/09/14 15:55:26 UTC, 6 replies.
- Mailing List - This post has NOT been accepted by the mailing list yet. - posted by defstat <de...@gmail.com> on 2015/09/14 16:07:02 UTC, 0 replies.
- Approach validation - building merged datasets - Spark SQL - posted by Vajra L <va...@gmail.com> on 2015/09/14 17:33:59 UTC, 0 replies.
- Where can I learn how to write udf? - posted by Sa...@wellsfargo.com on 2015/09/14 18:39:29 UTC, 1 replies.
- JavaRDD using Reflection - posted by Rachana Srivastava <Ra...@markmonitor.com> on 2015/09/14 18:54:25 UTC, 3 replies.
- Creating fat jar with all resources.(Spark-Java-Maven) - posted by Vipul Rai <vi...@gmail.com> on 2015/09/14 18:55:25 UTC, 1 replies.
- A way to timeout and terminate a laggard 'Stage' ? - posted by Dmitry Goldenberg <dg...@gmail.com> on 2015/09/14 19:10:54 UTC, 4 replies.
- Parse tab seperated file inc json efficent - posted by matthes <ma...@web.de> on 2015/09/14 19:56:09 UTC, 0 replies.
- Trouble using dynamic allocation and shuffle service. - posted by Philip Weaver <ph...@gmail.com> on 2015/09/14 20:36:06 UTC, 2 replies.
- Spark Streaming application code change and stateful transformations - posted by Ofir Kerker <of...@gmail.com> on 2015/09/14 20:49:12 UTC, 4 replies.
- unoin streams not working for streams > 3 - posted by Василец Дмитрий <pr...@gmail.com> on 2015/09/14 22:33:20 UTC, 4 replies.
- using existing R packages from SparkR - posted by bobtreacy <rt...@columbia.edu> on 2015/09/14 23:10:07 UTC, 0 replies.
- Re: Null Value in DecimalType column of DataFrame - posted by Yin Huai <yh...@databricks.com> on 2015/09/14 23:54:41 UTC, 3 replies.
- add external jar file to Spark shell vs. Scala Shell - posted by Lan Jiang <lj...@gmail.com> on 2015/09/15 00:11:31 UTC, 0 replies.
- How to convert dataframe to a nested StructType schema - posted by Hao Wang <bi...@gmail.com> on 2015/09/15 02:06:37 UTC, 3 replies.
- Spark aggregateByKey Issues - posted by 毕岩 <bi...@gmail.com> on 2015/09/15 05:25:39 UTC, 4 replies.
- Spark Streaming Suggestion - posted by srungarapu vamsi <sr...@gmail.com> on 2015/09/15 06:19:50 UTC, 5 replies.
- Caching intermediate results in Spark ML pipeline? - posted by Jingchu Liu <li...@gmail.com> on 2015/09/15 06:20:19 UTC, 8 replies.
- Change protobuf version or any other third party library version in Spark application - posted by Lan Jiang <lj...@gmail.com> on 2015/09/15 06:47:26 UTC, 8 replies.
- Setting Executor memory - posted by Thomas Gerber <th...@radius.com> on 2015/09/15 07:15:03 UTC, 0 replies.
- why spark and kafka always crash - posted by Joanne Contact <jo...@gmail.com> on 2015/09/15 07:58:43 UTC, 1 replies.
- [ANNOUNCE] Apache Gora 0.6.1 Release - posted by lewis john mcgibbney <le...@apache.org> on 2015/09/15 08:26:38 UTC, 0 replies.
- How to speed up MLlib LDA? - posted by Marko Asplund <ma...@gmail.com> on 2015/09/15 08:30:38 UTC, 7 replies.
- Relational Log Data - posted by 328d95 <20...@student.uwa.edu.au> on 2015/09/15 11:33:56 UTC, 1 replies.
- DStream flatMap "swallows" records - posted by Jeffrey Jedele <je...@gmail.com> on 2015/09/15 12:07:00 UTC, 0 replies.
- spark performance - executor computing time - posted by patcharee <Pa...@uni.no> on 2015/09/15 13:35:28 UTC, 2 replies.
- How does driver memory utilized - posted by Renu Yadav <yr...@gmail.com> on 2015/09/15 13:45:59 UTC, 1 replies.
- Re: Worker Machine running out of disk for Long running Streaming process - posted by gaurav sharma <sh...@gmail.com> on 2015/09/15 13:46:04 UTC, 0 replies.
- Re: Directly reading data from S3 to EC2 with PySpark - posted by Cazen <ca...@gmail.com> on 2015/09/15 13:54:49 UTC, 2 replies.
- Prevent spark from serializing some objects - posted by lev <ka...@gmail.com> on 2015/09/15 14:30:30 UTC, 0 replies.
- Using ML KMeans without hardcoded feature vector creation - posted by Tóth Zoltán <tz...@looper.hu> on 2015/09/15 16:13:56 UTC, 0 replies.
- Spark wastes a lot of space (tmp data) for iterative jobs - posted by Ali Hadian <ha...@comp.iust.ac.ir> on 2015/09/15 16:42:05 UTC, 4 replies.
- Getting parent RDD - posted by Samya <sa...@amadeus.com> on 2015/09/15 17:43:26 UTC, 3 replies.
- mappartition's FlatMapFunction help - posted by dinizthiagobr <di...@gmail.com> on 2015/09/15 19:11:28 UTC, 2 replies.
- Managing scheduling delay in Spark Streaming - posted by Michal Čizmazia <mi...@gmail.com> on 2015/09/15 20:26:21 UTC, 2 replies.
- Dynamic Workflow Execution using Spark - posted by Ashish Soni <as...@gmail.com> on 2015/09/15 22:19:59 UTC, 1 replies.
- GraphX, graph clustering, pattern matching - posted by Alex Karargyris <ak...@gmail.com> on 2015/09/16 05:45:31 UTC, 0 replies.
- Difference between sparkDriver and "executor ID driver" - posted by Muler <mu...@gmail.com> on 2015/09/16 06:14:23 UTC, 1 replies.
- Re: Get all the Nodes connected to a Node - posted by Robineast <Ro...@xense.co.uk> on 2015/09/16 08:58:01 UTC, 0 replies.
- Idle time between jobs - posted by patcharee <Pa...@uni.no> on 2015/09/16 11:35:24 UTC, 0 replies.
- How to recovery DStream from checkpoint directory? - posted by Bin Wang <wb...@gmail.com> on 2015/09/16 12:22:33 UTC, 7 replies.
- How to update python code in memory - posted by Margus Roo <ma...@roo.ee> on 2015/09/16 13:06:43 UTC, 1 replies.
- Spark streaming on spark-standalone/ yarn inside Spring XD - posted by Vignesh Radhakrishnan <Vi...@Altiux.com> on 2015/09/16 14:01:01 UTC, 0 replies.
- Spark Thrift Server JDBC Drivers - posted by Daniel Haviv <da...@veracity-group.com> on 2015/09/16 14:34:04 UTC, 2 replies.
- How to calculate average from multiple values - posted by diplomatic Guru <di...@gmail.com> on 2015/09/16 16:46:31 UTC, 1 replies.
- Spark on YARN / aws - executor lost on node restart - posted by Adrian Tanase <at...@adobe.com> on 2015/09/16 17:01:59 UTC, 2 replies.
- Spark Cassandra Filtering - posted by Ashish Soni <as...@gmail.com> on 2015/09/16 18:20:25 UTC, 0 replies.
- Spark SQL 'create table' options - posted by Dan LaBar <da...@gmail.com> on 2015/09/16 20:02:29 UTC, 0 replies.
- DataFrame repartition not repartitioning - posted by Steve Annessa <st...@gmail.com> on 2015/09/16 20:08:24 UTC, 1 replies.
- Re: Spark streaming on spark-standalone/ yarn inside Spring XD - posted by Tathagata Das <td...@databricks.com> on 2015/09/16 20:19:46 UTC, 3 replies.
- why when I double the number of workers, ml LogisticRegression fitting time is not reduced in half? - posted by julia <vi...@adobe.com> on 2015/09/16 20:43:24 UTC, 1 replies.
- problem with a very simple word count program - posted by huajun <hu...@gmail.com> on 2015/09/16 21:07:21 UTC, 2 replies.
- Incorrect results with spark sql - posted by gpatcham <gp...@gmail.com> on 2015/09/16 21:20:55 UTC, 0 replies.
- SparkR - calling as.vector() with rdd dataframe causes error - posted by ekraffmiller <el...@gmail.com> on 2015/09/16 21:30:20 UTC, 7 replies.
- Suggested Method for Execution of Periodic Actions - posted by Bryan Jeffrey <br...@gmail.com> on 2015/09/16 21:32:52 UTC, 3 replies.
- unpersist RDD from another thread - posted by Paul Weiss <pa...@gmail.com> on 2015/09/16 22:06:31 UTC, 3 replies.
- Issue with writing Dataframe to Vertica through JDBC - posted by Divya Ravichandran <di...@gmail.com> on 2015/09/16 22:52:55 UTC, 0 replies.
- parquet error - posted by Chengi Liu <ch...@gmail.com> on 2015/09/17 01:59:50 UTC, 2 replies.
- spark-submit chronos issue - posted by "Saurabh Malviya (samalviy)" <sa...@cisco.com> on 2015/09/17 02:20:15 UTC, 0 replies.
- Stopping criteria for gradient descent - posted by Nishanth P S <ni...@gmail.com> on 2015/09/17 02:31:01 UTC, 3 replies.
- Table is modified by DataFrameWriter - posted by "guoqing0629@yahoo.com.hk" <gu...@yahoo.com.hk> on 2015/09/17 04:11:29 UTC, 4 replies.
- spark sql hook - posted by "r7raul1984@163.com" <r7...@163.com> on 2015/09/17 04:53:58 UTC, 4 replies.
- Iterating over JavaRDD - posted by Tapan Sharma <ta...@gmail.com> on 2015/09/17 05:30:09 UTC, 1 replies.
- Lost tasks in Spark SQL join jobs - posted by Gang Bai <ba...@staff.sina.com.cn> on 2015/09/17 05:56:44 UTC, 1 replies.
- Input parsing time - posted by Carlos Eduardo Santos <ce...@gmail.com> on 2015/09/17 09:09:28 UTC, 1 replies.
- Spark Web UI + NGINX - posted by Renato Perini <re...@gmail.com> on 2015/09/17 11:06:12 UTC, 2 replies.
- a document for JDK version testing status - posted by lu...@sina.com on 2015/09/17 11:15:34 UTC, 0 replies.
- Saprk.frame.Akkasize - posted by Angel Angel <ar...@gmail.com> on 2015/09/17 11:28:15 UTC, 1 replies.
- Error with twitter streaming - posted by Deepak Subhramanian <de...@gmail.com> on 2015/09/17 11:50:34 UTC, 0 replies.
- Re: [Spark Streaming] Distribute custom receivers evenly across excecutors - posted by "patrizio.munzi" <pa...@gmail.com> on 2015/09/17 14:49:19 UTC, 0 replies.
- Spark Streaming kafka directStream value decoder issue - posted by srungarapu vamsi <sr...@gmail.com> on 2015/09/17 15:03:49 UTC, 7 replies.
- Checkpointing with Kinesis - posted by Alan Dipert <al...@dipert.org> on 2015/09/17 17:48:48 UTC, 4 replies.
- NGINX + Spark Web UI - posted by mjordan79 <re...@gmail.com> on 2015/09/17 17:50:14 UTC, 1 replies.
- Can we do dataframe.query like Pandas dataframe in spark? - posted by Rex X <dn...@gmail.com> on 2015/09/17 18:32:49 UTC, 2 replies.
- How to add sparkSQL into a standalone application - posted by Cui Lin <ic...@gmail.com> on 2015/09/17 19:59:19 UTC, 3 replies.
- SPARK-SQL parameter tuning for performance - posted by Sadhan Sood <sa...@gmail.com> on 2015/09/17 20:01:44 UTC, 0 replies.
- in joins, does one side stream? - posted by Koert Kuipers <ko...@tresata.com> on 2015/09/17 20:21:38 UTC, 7 replies.
- Spark data type guesser UDAF - posted by Ruslan Dautkhanov <da...@gmail.com> on 2015/09/17 20:32:13 UTC, 1 replies.
- Has anyone used the Twitter API for location filtering? - posted by Jo Sunad <na...@gmail.com> on 2015/09/17 21:16:55 UTC, 3 replies.
- KafkaDirectStream can't be recovered from checkpoint - posted by Petr Novak <os...@gmail.com> on 2015/09/17 21:36:42 UTC, 11 replies.
- Creating BlockMatrix with java API - posted by Pulasthi Supun Wickramasinghe <pu...@gmail.com> on 2015/09/17 21:36:44 UTC, 6 replies.
- selected field not getting pushed down into my DataSource? - posted by Timothy Potter <th...@gmail.com> on 2015/09/17 21:45:14 UTC, 0 replies.
- WAL on S3 - posted by Michal Čizmazia <mi...@gmail.com> on 2015/09/17 22:09:02 UTC, 16 replies.
- Create view on nested JSON doesn't recognize column names - posted by Dan LaBar <da...@gmail.com> on 2015/09/17 23:17:02 UTC, 0 replies.
- Cache after filter Vs Writing back to HDFS - posted by Gavin Yue <yu...@gmail.com> on 2015/09/17 23:17:41 UTC, 1 replies.
- Distribute JMS receiver jobs on YARN - posted by ni...@free.fr on 2015/09/17 23:51:52 UTC, 0 replies.
- Spark streaming to database exception handling - posted by david w <df...@gmail.com> on 2015/09/17 23:56:43 UTC, 2 replies.
- Spark w/YARN Scheduling Questions... - posted by Robert Saccone <rs...@gmail.com> on 2015/09/18 00:31:58 UTC, 1 replies.
- Performance changes quite large - posted by Gavin Yue <yu...@gmail.com> on 2015/09/18 01:59:20 UTC, 0 replies.
- DecisionTree hangs, then crashes - posted by jluan <ja...@gmail.com> on 2015/09/18 02:24:26 UTC, 0 replies.
- master hung after killing the streaming sc - posted by ZhuGe <tc...@outlook.com> on 2015/09/18 08:35:00 UTC, 1 replies.
- Notification on Spark Streaming job failure - posted by Krzysztof Zarzycki <k....@gmail.com> on 2015/09/18 08:35:59 UTC, 1 replies.
- Running the deep-learning the application on cluster: - posted by Angel Angel <ar...@gmail.com> on 2015/09/18 11:17:23 UTC, 0 replies.
- Zeppelin on Yarn : org.apache.spark.SparkException: Detected yarn-cluster mode, but isn't running on a cluster. Deployment to YARN is not supported directly by SparkContext. Please use spark-submit. - posted by shahab <sh...@gmail.com> on 2015/09/18 12:07:56 UTC, 4 replies.
- Constant Spark execution time with different # of slaves - posted by Warfish <se...@gmail.com> on 2015/09/18 13:12:10 UTC, 0 replies.
- GraphX to work with Streaming - posted by Rohit Kumar <ro...@gmail.com> on 2015/09/18 15:37:46 UTC, 0 replies.
- spark 1.5, ML Pipeline Decision Tree Dataframe Problem - posted by Yasemin Kaya <go...@gmail.com> on 2015/09/18 16:32:22 UTC, 2 replies.
- Python UDF and explode error - posted by Pavel Burdanov <bu...@mail.ru> on 2015/09/18 16:36:15 UTC, 1 replies.
- Breakpoints not hit with Scalatest + intelliJ - posted by Michel Lemay <ml...@gmail.com> on 2015/09/18 16:37:02 UTC, 1 replies.
- Spark Streaming checkpoint recovery throws Stack Overflow Error - posted by swetha <sw...@gmail.com> on 2015/09/18 18:15:19 UTC, 2 replies.
- SparkContext declared as object variable - posted by Priya Ch <le...@gmail.com> on 2015/09/18 19:51:32 UTC, 5 replies.
- unsubscribe - posted by Nambi <na...@gmail.com> on 2015/09/18 22:11:07 UTC, 7 replies.
- Does anyone use ShuffleDependency directly? - posted by Josh Rosen <jo...@databricks.com> on 2015/09/18 22:17:14 UTC, 0 replies.
- SparkR pca? - posted by Deborah Siegel <de...@gmail.com> on 2015/09/18 22:41:23 UTC, 0 replies.
- Not able to group by Scala UDF - posted by Jeff Jones <jj...@adaptivebiotech.com> on 2015/09/18 23:39:52 UTC, 0 replies.
- What's the best practice to parse JSON using spark - posted by Cui Lin <ic...@gmail.com> on 2015/09/19 02:09:24 UTC, 5 replies.
- SparkML pipelines and error recovery - posted by Fatma Ozcan <fa...@gmail.com> on 2015/09/19 03:37:10 UTC, 0 replies.
- Using Spark for portfolio manager app - posted by Thúy Hằng Lê <th...@gmail.com> on 2015/09/19 04:43:53 UTC, 12 replies.
- question building spark in a virtual machine - posted by Eyal Altshuler <ey...@gmail.com> on 2015/09/19 14:31:07 UTC, 8 replies.
- Docker/Mesos with Spark - posted by John Omernik <jo...@omernik.com> on 2015/09/19 14:42:41 UTC, 1 replies.
- word count (group by users) in spark - posted by "Kali.tummala@gmail.com" <Ka...@gmail.com> on 2015/09/19 15:11:37 UTC, 5 replies.
- Re: Kafka createDirectStream ​issue - posted by "Kali.tummala@gmail.com" <Ka...@gmail.com> on 2015/09/19 22:07:24 UTC, 1 replies.
- Unable to see my kafka spark streaming output - posted by "Kali.tummala@gmail.com" <Ka...@gmail.com> on 2015/09/19 22:30:26 UTC, 1 replies.
- PrunedFilteredScan does not work for UDTs and Struct fields - posted by Richard Eggert <ri...@gmail.com> on 2015/09/20 00:59:09 UTC, 2 replies.
- DataGenerator for streaming application - posted by Saiph Kappa <sa...@gmail.com> on 2015/09/20 06:26:48 UTC, 2 replies.
- Problem at sbt/sbt assembly - posted by Aaroncq4 <47...@qq.com> on 2015/09/21 03:11:07 UTC, 2 replies.
- Web UI for Spark Streaming app lists jobs incorrectly - posted by Jon Chase <jo...@gmail.com> on 2015/09/21 03:28:18 UTC, 1 replies.
- Spark DAG Visualization for HiveQL - posted by Narayanan K <kn...@gmail.com> on 2015/09/21 04:33:06 UTC, 3 replies.
- What is a taskBinary for a ShuffleMapTask? What is its purpose? - posted by Muler <mu...@gmail.com> on 2015/09/21 08:06:26 UTC, 0 replies.
- Re: Class cast exception : Spark 1.5 - posted by sim <si...@swoop.com> on 2015/09/21 08:22:59 UTC, 0 replies.
- Fwd: Issue with high no of skipped task - posted by Saurav Sinha <sa...@gmail.com> on 2015/09/21 08:53:55 UTC, 2 replies.
- Spark Lost executor && shuffle.FetchFailedException - posted by bi...@gmail.com on 2015/09/21 09:10:48 UTC, 1 replies.
- Hbase Spark streaming issue. - posted by Siva <sb...@gmail.com> on 2015/09/21 09:46:11 UTC, 1 replies.
- Deploying spark-streaming application on production - posted by Jeetendra Gangele <ga...@gmail.com> on 2015/09/21 10:09:16 UTC, 8 replies.
- passing SparkContext as parameter - posted by Priya Ch <le...@gmail.com> on 2015/09/21 11:27:20 UTC, 12 replies.
- mongo-hadoop with Spark is slow for me, and adding nodes doesn't seem to make any noticeable difference - posted by cscarioni <ca...@simplybusiness.co.uk> on 2015/09/21 12:00:55 UTC, 0 replies.
- spark with internal ip - posted by ZhuGe <tc...@outlook.com> on 2015/09/21 12:05:19 UTC, 0 replies.
- How to get a new RDD by ordinarily subtract its adjacent rows - posted by Zhiliang Zhu <zc...@yahoo.com.INVALID> on 2015/09/21 13:29:27 UTC, 8 replies.
- spark + parquet + schema name and metadata - posted by Borisa Zivkovic <bo...@gmail.com> on 2015/09/21 16:13:02 UTC, 7 replies.
- Why are executors on slave never used? - posted by Joshua Fox <jo...@twiggle.com> on 2015/09/21 16:37:11 UTC, 3 replies.
- Count for select not matching count for group by - posted by Michael Kelly <mi...@gmail.com> on 2015/09/21 17:06:29 UTC, 2 replies.
- sqlContext.read.avro broadcasting files from the driver - posted by Daniel Haviv <da...@veracity-group.com> on 2015/09/21 17:13:56 UTC, 0 replies.
- AWS_CREDENTIAL_FILE - posted by Michel Lemay <ml...@gmail.com> on 2015/09/21 17:34:05 UTC, 3 replies.
- Python Packages in Spark w/Mesos - posted by John Omernik <jo...@omernik.com> on 2015/09/21 17:34:27 UTC, 1 replies.
- Spark SQL DataFrame 1.5.0 is extremely slow for take(1) or head() or first() - posted by Jerry Lam <ch...@gmail.com> on 2015/09/21 17:56:15 UTC, 6 replies.
- Exception initializing JavaSparkContext - posted by ekraffmiller <el...@gmail.com> on 2015/09/21 18:14:07 UTC, 4 replies.
- how to get RDD from two different RDDs with cross column - posted by Zhiliang Zhu <zc...@yahoo.com.INVALID> on 2015/09/21 19:33:49 UTC, 2 replies.
- JDBCRdd issue - posted by "Saurabh Malviya (samalviy)" <sa...@cisco.com> on 2015/09/21 20:03:42 UTC, 0 replies.
- Spark Streaming and Kafka MultiNode Setup - Data Locality - posted by Ashish Soni <as...@gmail.com> on 2015/09/21 20:15:49 UTC, 2 replies.
- Serialization Error with PartialFunction / immutable sets - posted by Chaney Courtney <ch...@santoslab.org> on 2015/09/21 20:41:15 UTC, 0 replies.
- HiveQL Compatibility (0.12.0, 0.13.0???) - posted by Dominic Ricard <Do...@tritondigital.com> on 2015/09/21 22:21:48 UTC, 1 replies.
- Slow Performance with Apache Spark Gradient Boosted Tree training runs - posted by vkutsenko <vl...@gmail.com> on 2015/09/21 22:41:37 UTC, 1 replies.
- Mesos Tasks only run on one node - posted by John Omernik <jo...@omernik.com> on 2015/09/21 22:58:48 UTC, 1 replies.
- Spark Streaming distributed job - posted by ni...@free.fr on 2015/09/22 00:51:08 UTC, 1 replies.
- Iterator-based streaming, how is it efficient ? - posted by Samuel Hailu <sa...@gmail.com> on 2015/09/22 01:05:19 UTC, 0 replies.
- Troubleshooting "Task not serializable" in Spark/Scala environments - posted by Balaji Vijayan <ba...@gmail.com> on 2015/09/22 01:07:38 UTC, 4 replies.
- Re: Long GC pauses with Spark SQL 1.3.0 and billion row tables - posted by tridib <tr...@live.com> on 2015/09/22 01:20:01 UTC, 5 replies.
- spark.mesos.coarse impacts memory performance on mesos - posted by Utkarsh Sengar <ut...@gmail.com> on 2015/09/22 02:18:33 UTC, 5 replies.
- Remove duplicate keys by always choosing first in file. - posted by Philip Weaver <ph...@gmail.com> on 2015/09/22 02:26:01 UTC, 13 replies.
- How does one use s3 for checkpointing? - posted by Amit Ramesh <am...@yelp.com> on 2015/09/22 03:24:18 UTC, 2 replies.
- Invalid checkpoint url - posted by srungarapu vamsi <sr...@gmail.com> on 2015/09/22 06:59:15 UTC, 5 replies.
- Spark Ingestion into Relational DB - posted by Sri <sr...@gmail.com> on 2015/09/22 07:13:49 UTC, 4 replies.
- Streaming Receiver Imbalance Problem - posted by SLiZn Liu <sl...@gmail.com> on 2015/09/22 09:17:22 UTC, 4 replies.
- Uneven distribution of tasks among workers in Spark/GraphX 1.5.0 - posted by dmytro <be...@gmail.com> on 2015/09/22 11:18:53 UTC, 0 replies.
- SparkR for accumulo - posted by "madhvi.gupta" <ma...@orkash.com> on 2015/09/22 12:25:21 UTC, 5 replies.
- Querying on multiple Hive stores using Apache Spark - posted by Karthik <ka...@impetus.co.in> on 2015/09/22 12:44:02 UTC, 2 replies.
- Why RDDs are being dropped by Executors? - posted by Uthayan Suthakar <ut...@gmail.com> on 2015/09/22 13:20:30 UTC, 5 replies.
- Why is 1 executor overworked and other sit idle? - posted by Chirag Dewan <ch...@ericsson.com> on 2015/09/22 13:22:41 UTC, 4 replies.
- Heap Space Error - posted by Yusuf Can Gürkan <yu...@useinsider.com> on 2015/09/22 13:28:07 UTC, 1 replies.
- Re: MLlib inconsistent documentation - posted by Yashwanth Kumar <ya...@tcs.com> on 2015/09/22 14:04:46 UTC, 0 replies.
- spark-avro takes a lot time to load thousands of files - posted by Daniel Haviv <da...@veracity-group.com> on 2015/09/22 14:10:49 UTC, 4 replies.
- Py4j issue with Python Kafka Module - posted by ayan guha <gu...@gmail.com> on 2015/09/22 15:41:08 UTC, 5 replies.
- Performance Spark SQL vs Dataframe API faster - posted by sanderg <s....@wimionline.be> on 2015/09/22 16:05:31 UTC, 1 replies.
- spark on mesos gets killed by cgroups for too much memory - posted by oggie <go...@gmail.com> on 2015/09/22 16:19:20 UTC, 1 replies.
- Error while saving parquet - posted by gtinside <gt...@gmail.com> on 2015/09/22 16:37:22 UTC, 0 replies.
- Help getting started with Kafka - posted by Yana Kadiyska <ya...@gmail.com> on 2015/09/22 16:38:30 UTC, 2 replies.
- Apache Spark job in local[*] is slower than regular 1-thread Python program - posted by juljoin <ju...@hotmail.com> on 2015/09/22 17:37:47 UTC, 2 replies.
- Spark 1.5 UDAF ArrayType - posted by Deenar Toraskar <de...@gmail.com> on 2015/09/22 20:13:16 UTC, 3 replies.
- How to share memory in a broadcast between tasks in the same executor? - posted by Clément Frison <cl...@gmail.com> on 2015/09/22 20:42:59 UTC, 2 replies.
- pyspark question: create RDD from csr_matrix - posted by jeff saremi <je...@hotmail.com> on 2015/09/22 21:02:27 UTC, 0 replies.
- Spark as standalone or with Hadoop stack. - posted by Shiv Kandavelu <sh...@riversand.com> on 2015/09/22 21:25:34 UTC, 5 replies.
- KafkaProducer using Cassandra as source - posted by "Kali.tummala@gmail.com" <Ka...@gmail.com> on 2015/09/22 22:14:44 UTC, 2 replies.
- Partitions on RDDs - posted by XIANDI <zx...@hotmail.com> on 2015/09/23 00:41:58 UTC, 2 replies.
- HDP 2.3 support for Spark 1.5.x - posted by Krishna Sankar <ks...@gmail.com> on 2015/09/23 00:42:34 UTC, 4 replies.
- SPARK_WORKER_INSTANCES was detected (set to '2')…This is deprecated in Spark 1.0+ - posted by Jacek Laskowski <ja...@japila.pl> on 2015/09/23 00:57:55 UTC, 0 replies.
- Yarn Shutting Down Spark Processing - posted by Bryan Jeffrey <br...@gmail.com> on 2015/09/23 02:49:06 UTC, 7 replies.
- Spark standalone/Mesos on top of Ceph - posted by "fightfate@163.com" <fi...@163.com> on 2015/09/23 03:28:30 UTC, 3 replies.
- how to submit the spark job outside the cluster - posted by Zhiliang Zhu <zc...@yahoo.com.INVALID> on 2015/09/23 03:37:38 UTC, 11 replies.
- Parallel collection in driver programs - posted by Andy Huang <an...@servian.com.au> on 2015/09/23 04:39:15 UTC, 0 replies.
- Scala Limitation - Case Class definition with more than 22 arguments - posted by satish chandra j <js...@gmail.com> on 2015/09/23 05:48:56 UTC, 11 replies.
- JdbcRDD Constructor - posted by satish chandra j <js...@gmail.com> on 2015/09/23 07:30:34 UTC, 7 replies.
- Re: SparkR vs R - posted by Yashwanth Kumar <ya...@tcs.com> on 2015/09/23 07:39:56 UTC, 0 replies.
- How to make Group By/reduceByKey more efficient? - posted by swetha <sw...@gmail.com> on 2015/09/23 07:43:43 UTC, 1 replies.
- How to get RDD from PairRDD in Java - posted by "Zhang, Jingyu" <ji...@news.com.au> on 2015/09/23 08:24:59 UTC, 2 replies.
- Topic Modelling- LDA - posted by Subshiri S <su...@gmail.com> on 2015/09/23 08:54:21 UTC, 1 replies.
- How to subtract two RDDs with same size - posted by Zhiliang Zhu <zc...@yahoo.com.INVALID> on 2015/09/23 09:11:04 UTC, 3 replies.
- Is it possible to merged delayed batches in streaming? - posted by Bin Wang <wb...@gmail.com> on 2015/09/23 09:17:13 UTC, 1 replies.
- How to control spark.sql.shuffle.partitions per query - posted by tridib <tr...@live.com> on 2015/09/23 09:42:55 UTC, 1 replies.
- Checkpoint files are saved before stream is saved to file (rdd.toDF().write ...)? - posted by Petr Novak <os...@gmail.com> on 2015/09/23 10:49:49 UTC, 2 replies.
- Dose spark auto invoke StreamingContext.stop while receive kill signal? - posted by Bin Wang <wb...@gmail.com> on 2015/09/23 12:33:19 UTC, 2 replies.
- Updation of a graph based on changed input - posted by aparasur <pa...@gmail.com> on 2015/09/23 12:38:20 UTC, 0 replies.
- Can DataFrames with different schema be joined efficiently - posted by MrJew <ko...@gmail.com> on 2015/09/23 12:51:21 UTC, 0 replies.
- Cosine LSH Join - posted by Demir <oe...@gmx.de> on 2015/09/23 15:02:48 UTC, 3 replies.
- Calling a method parallel - posted by Tapan Sharma <ta...@gmail.com> on 2015/09/23 15:46:21 UTC, 2 replies.
- K Means Explanation - posted by Tapan Sharma <ta...@gmail.com> on 2015/09/23 15:54:29 UTC, 1 replies.
- How to turn off Jetty Http stack errors on Spark web - posted by Rafal Grzymkowski <my...@o2.pl> on 2015/09/23 15:56:17 UTC, 2 replies.
- create table in hive from spark-sql - posted by Mohit Singh <mo...@gmail.com> on 2015/09/23 19:30:51 UTC, 0 replies.
- How to turn on basic authentication for the Spark Web - posted by Rafal Grzymkowski <my...@o2.pl> on 2015/09/23 20:13:36 UTC, 3 replies.
- Provide sampling ratio while loading json in spark version > 1.4.0 - posted by Udit Mehta <um...@groupon.com> on 2015/09/23 21:09:33 UTC, 0 replies.
- reduceByKeyAndWindow confusion - posted by srungarapu vamsi <sr...@gmail.com> on 2015/09/23 21:51:26 UTC, 1 replies.
- Java Heap Space Error - posted by Yusuf Can Gürkan <yu...@useinsider.com> on 2015/09/23 22:07:17 UTC, 12 replies.
- Join over many small files - posted by "Tracewski, Lukasz " <lu...@credit-suisse.com> on 2015/09/23 23:31:04 UTC, 2 replies.
- Custom Hadoop InputSplit, Spark partitions, spark executors/task and Yarn containers - posted by Anfernee Xu <an...@gmail.com> on 2015/09/23 23:38:47 UTC, 3 replies.
- How to obtain the key in updateStateByKey - posted by swetha <sw...@gmail.com> on 2015/09/24 00:01:12 UTC, 1 replies.
- LogisticRegression models consumes all driver memory - posted by Eugene Zhulenev <eu...@gmail.com> on 2015/09/24 00:19:46 UTC, 6 replies.
- [POWERED BY] Please add our organization - posted by Oleg Shirokikh <Ol...@solver.com> on 2015/09/24 00:41:42 UTC, 1 replies.
- Debugging too many files open exception issue in Spark shuffle - posted by DB Tsai <db...@dbtsai.com> on 2015/09/24 00:53:54 UTC, 2 replies.
- CrossValidator speed - for loop on each parameter map? - posted by julia <vi...@adobe.com> on 2015/09/24 01:42:34 UTC, 0 replies.
- caching DataFrames - posted by "Zhang, Jingyu" <ji...@news.com.au> on 2015/09/24 02:07:10 UTC, 3 replies.
- Odd behavior when re-defining a val in spark-shell - posted by Boris Alexeev <bo...@gmail.com> on 2015/09/24 02:53:58 UTC, 0 replies.
- Spark 1.5.0 on YARN dynamicAllocation - Initial job has not accepted any resources - posted by Jonathan Kelly <jo...@gmail.com> on 2015/09/24 03:04:02 UTC, 5 replies.
- Fwd: Executor lost - posted by Angel Angel <ar...@gmail.com> on 2015/09/24 03:18:45 UTC, 0 replies.
- Spark ClosureCleaner or java serializer OOM when trying to grow - posted by jluan <ja...@gmail.com> on 2015/09/24 03:59:35 UTC, 2 replies.
- How to fix some WARN when submit job on spark 1.5 YARN - posted by "r7raul1984@163.com" <r7...@163.com> on 2015/09/24 04:25:34 UTC, 2 replies.
- KMeans Model fails to run - posted by "Soong, Eddie" <ed...@zementis.com> on 2015/09/24 05:19:13 UTC, 0 replies.
- No space left on device when running graphx job - posted by Jack Yang <ji...@uow.edu.au> on 2015/09/24 08:29:12 UTC, 5 replies.
- Join two dataframe - Timeout after 5 minutes - posted by Eyad Sibai <ey...@gmail.com> on 2015/09/24 09:16:03 UTC, 1 replies.
- Fwd: Spark streaming DStream state on worker - posted by Shixiong Zhu <zs...@gmail.com> on 2015/09/24 10:51:03 UTC, 0 replies.
- Re: JobScheduler: Error generating jobs for time for custom InputDStream - posted by Shixiong Zhu <zs...@gmail.com> on 2015/09/24 11:04:35 UTC, 1 replies.
- Legacy Python code - posted by Joshua Fox <jo...@twiggle.com> on 2015/09/24 11:36:47 UTC, 0 replies.
- GroupBy Java objects in Java Spark - posted by Ramkumar V <ra...@gmail.com> on 2015/09/24 11:45:10 UTC, 2 replies.
- Exception during SaveAstextFile Stage - posted by Chirag Dewan <ch...@ericsson.com> on 2015/09/24 12:28:54 UTC, 0 replies.
- Not fetching all records from Cassandra DB - posted by satish chandra j <js...@gmail.com> on 2015/09/24 13:55:05 UTC, 0 replies.
- kafka direct streaming with checkpointing - posted by Radu Brumariu <br...@gmail.com> on 2015/09/24 15:40:11 UTC, 10 replies.
- Networking issues with Spark on EC2 - posted by SURAJ SHETH <sh...@gmail.com> on 2015/09/24 16:08:55 UTC, 4 replies.
- Re: Long running Spark Streaming Job increasing executing time per batch - posted by Jeremy Smith <je...@acorns.com> on 2015/09/24 16:35:25 UTC, 0 replies.
- Scala api end points - posted by masoom alam <ma...@wanclouds.net> on 2015/09/24 17:20:25 UTC, 1 replies.
- Reading avro data using KafkaUtils.createDirectStream - posted by Daniel Haviv <da...@veracity-group.com> on 2015/09/24 17:30:12 UTC, 1 replies.
- NegativeArraySizeException on Spark SQL window function - posted by "Bae, Jae Hyeon" <me...@gmail.com> on 2015/09/24 18:44:13 UTC, 0 replies.
- reduceByKey inside updateStateByKey in Spark Streaming??? - posted by swetha <sw...@gmail.com> on 2015/09/24 18:47:45 UTC, 1 replies.
- Large number of conf broadcasts - posted by Anders Arpteg <ar...@spotify.com> on 2015/09/24 19:24:48 UTC, 0 replies.
- Using Map and Basic Operators yield java.lang.ClassCastException (Parquet + Hive + Spark SQL 1.5.0 + Thrift) - posted by Dominic Ricard <Do...@tritondigital.com> on 2015/09/24 20:34:31 UTC, 4 replies.
- Potential racing condition in DAGScheduler when Spark 1.5 caching - posted by robin_up <ro...@gmail.com> on 2015/09/24 21:00:17 UTC, 3 replies.
- why more than more jobs in a batch in spark streaming ? - posted by "Shenghua(Daniel) Wan" <wa...@gmail.com> on 2015/09/24 21:04:15 UTC, 1 replies.
- Pyspark throws: java.net.BindException: Cannot assign requested address - posted by lucask <lu...@cloud101.eu> on 2015/09/24 22:24:58 UTC, 0 replies.
- ERROR BoundedByteBufferReceive: OOME with size 352518400 - posted by Sourabh Chandak <so...@gmail.com> on 2015/09/24 22:25:46 UTC, 6 replies.
- executor-cores setting does not work under Yarn - posted by Gavin Yue <yu...@gmail.com> on 2015/09/24 22:28:46 UTC, 2 replies.
- Reasonable performance numbers? - posted by "Young, Matthew T" <ma...@intel.com> on 2015/09/24 22:47:54 UTC, 1 replies.
- Reading Hive Tables using SQLContext - posted by Sathish Kumaran Vairavelu <vs...@gmail.com> on 2015/09/24 23:47:08 UTC, 3 replies.
- Unable to start spark-shell on YARN - posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com on 2015/09/25 00:11:26 UTC, 2 replies.
- Why Checkpoint is throwing "actor.OneForOneStrategy: NullPointerException" - posted by Uthayan Suthakar <ut...@gmail.com> on 2015/09/25 02:02:04 UTC, 3 replies.
- Exception on save s3n file (1.4.1, hadoop 2.6) - posted by "Zhang, Jingyu" <ji...@news.com.au> on 2015/09/25 04:35:00 UTC, 1 replies.
- Stop a Dstream computation - posted by Samya <sa...@amadeus.com> on 2015/09/25 06:20:52 UTC, 1 replies.
- spark.streaming.concurrentJobs - posted by Atul Kulkarni <at...@gmail.com> on 2015/09/25 08:56:07 UTC, 2 replies.
- SQLContext.read().json() inferred schema - force type to strings? - posted by Ewan Leith <ew...@realitymine.com> on 2015/09/25 09:45:10 UTC, 0 replies.
- Weird worker usage - posted by N B <nb...@gmail.com> on 2015/09/25 10:14:58 UTC, 9 replies.
- Error: Asked to remove non-existent executor - posted by "Tracewski, Lukasz " <lu...@credit-suisse.com> on 2015/09/25 10:35:18 UTC, 0 replies.
- Setting Spark TMP Directory in Cluster Mode - posted by mufy <mu...@gmail.com> on 2015/09/25 10:44:08 UTC, 3 replies.
- Troubles interacting with different version of Hive metastore - posted by Ferran Galí <fe...@trovit.com> on 2015/09/25 13:02:08 UTC, 0 replies.
- Unreachable dead objects permanently retained on heap - posted by James Aley <ja...@swiftkey.com> on 2015/09/25 13:31:11 UTC, 1 replies.
- How to set spark envoirnment variable SPARK_LOCAL_IP in conf/spark-env.sh - posted by Zhiliang Zhu <zc...@yahoo.com.INVALID> on 2015/09/25 13:46:01 UTC, 1 replies.
- Spark task error - posted by "madhvi.gupta" <ma...@orkash.com> on 2015/09/25 13:50:40 UTC, 0 replies.
- sometimes No event logs found for application using same JavaSparkSQL example - posted by "ouruia@cnsuning.com" <ou...@cnsuning.com> on 2015/09/25 14:00:45 UTC, 1 replies.
- 转发: sometimes No event logs found for application using same JavaSparkSQL example - posted by "ouruia@cnsuning.com" <ou...@cnsuning.com> on 2015/09/25 14:18:08 UTC, 0 replies.
- Re: Error: Asked to remove non-existent executor - posted by Akhil Das <ak...@sigmoidanalytics.com> on 2015/09/25 14:33:30 UTC, 0 replies.
- 回复: sometimes No event logs found for application using same JavaSparkSQL example - posted by "ouruia@cnsuning.com" <ou...@cnsuning.com> on 2015/09/25 14:38:40 UTC, 0 replies.
- Transformation pipeling and parallelism in Spark - posted by Zhongmiao Li <ma...@gmail.com> on 2015/09/25 14:48:41 UTC, 0 replies.
- Receiver and Parallelization - posted by ni...@free.fr on 2015/09/25 15:08:04 UTC, 3 replies.
- Best practices for scheduling Spark jobs on "shared" YARN cluster using Autosys - posted by unk1102 <um...@gmail.com> on 2015/09/25 15:54:46 UTC, 0 replies.
- HDFS is undefined - posted by Angel Angel <ar...@gmail.com> on 2015/09/25 16:13:03 UTC, 2 replies.
- --class has to be always specified in spark-submit either it is defined in jar manifest? - posted by Petr Novak <os...@gmail.com> on 2015/09/25 16:40:05 UTC, 2 replies.
- Generic DataType in UDAF - posted by Ritesh Agrawal <ra...@netflix.com.INVALID> on 2015/09/25 17:07:39 UTC, 2 replies.
- Handle null/NaN values in mllib classifier - posted by matd <ma...@gmail.com> on 2015/09/25 18:44:26 UTC, 0 replies.
- hive on spark query error - posted by Garry Chen <gc...@cornell.edu> on 2015/09/25 18:56:31 UTC, 5 replies.
- java.io.NotSerializableException: org.apache.avro.Schema$RecordSchema - posted by Daniel Haviv <da...@veracity-group.com> on 2015/09/25 19:06:44 UTC, 2 replies.
- Kafka & Spark Streaming - posted by Neelesh <ne...@gmail.com> on 2015/09/25 19:50:01 UTC, 7 replies.
- Broadcast to executors with multiple cores - posted by Jeff Palmucci <jp...@tripadvisor.com> on 2015/09/25 20:14:50 UTC, 0 replies.
- Convert Vector to RDD[Double] - posted by Yusuf Can Gürkan <yu...@useinsider.com> on 2015/09/25 20:46:46 UTC, 1 replies.
- how to handle OOMError from groupByKey - posted by Elango Cheran <el...@gmail.com> on 2015/09/25 21:35:33 UTC, 4 replies.
- Distance metrics in KMeans - posted by bobtreacy <rt...@columbia.edu> on 2015/09/25 22:58:45 UTC, 2 replies.
- Is this a Spark issue or Hive issue that Spark cannot read the string type data in the Parquet generated by Hive - posted by java8964 <ja...@hotmail.com> on 2015/09/25 23:03:28 UTC, 3 replies.
- how to control timeout in node failure for spark task ? - posted by roy <rp...@njit.edu> on 2015/09/25 23:32:42 UTC, 0 replies.
- [SPARK-SQL] Requested array size exceeds VM limit - posted by Sadhan Sood <sa...@gmail.com> on 2015/09/26 00:00:40 UTC, 0 replies.
- GraphX create graph with multiple node attributes - posted by JJ <je...@gmail.com> on 2015/09/26 00:07:46 UTC, 5 replies.
- Fwd: Spark for Oracle sample code - posted by Cui Lin <ic...@gmail.com> on 2015/09/26 00:13:37 UTC, 3 replies.
- Spark SQL: Native Support for LATERAL VIEW EXPLODE - posted by Jerry Lam <ch...@gmail.com> on 2015/09/26 01:21:42 UTC, 3 replies.
- Error in starting sparkR: Error in socketConnection(port = monitorPort) : - posted by Jonathan Yue <jy...@yahoo.com.INVALID> on 2015/09/26 01:37:13 UTC, 0 replies.
- How to properly set conf/spark-env.sh for spark to run on yarn - posted by Zhiliang Zhu <zc...@yahoo.com.INVALID> on 2015/09/26 03:43:28 UTC, 8 replies.
- What is this Input Size in Spark Application Detail UI? - posted by Chirag Dewan <ch...@ericsson.com> on 2015/09/26 05:20:10 UTC, 0 replies.
- Re: Spark Streaming with Tachyon : Data Loss on Receiver Failure due to WAL error - posted by N B <nb...@gmail.com> on 2015/09/26 08:29:54 UTC, 5 replies.
- Problem with multiple fields with same name in Avro - posted by Anders Arpteg <ar...@spotify.com> on 2015/09/26 13:48:13 UTC, 0 replies.
- Re: What are best practices from Unit Testing Spark Code? - posted by ehrlichja <an...@aehrlich.com> on 2015/09/26 23:04:10 UTC, 0 replies.
- queup jobs in spark cluster - posted by manish ranjan <cs...@gmail.com> on 2015/09/27 01:03:41 UTC, 1 replies.
- HDFS small file generation problem - posted by ni...@free.fr on 2015/09/27 15:36:29 UTC, 3 replies.
- textFile() and includePackage() not found - posted by Eugene Cao <eu...@163.com> on 2015/09/28 02:01:56 UTC, 1 replies.
- FP-growth on stream data - posted by masoom alam <ma...@wanclouds.net> on 2015/09/28 07:37:53 UTC, 0 replies.
- Spark 1.5.0 Not able to submit jobs using cluster URL - posted by lokeshkumar <lo...@dataken.net> on 2015/09/28 07:59:39 UTC, 4 replies.
- Master getting down with Memory issue. - posted by Saurav Sinha <sa...@gmail.com> on 2015/09/28 08:07:18 UTC, 4 replies.
- Re: Strange shuffle behaviour difference between Zeppelin and Spark-shell - posted by Rick Moritz <ra...@gmail.com> on 2015/09/28 08:45:03 UTC, 4 replies.
- ML Pipeline - posted by Yasemin Kaya <go...@gmail.com> on 2015/09/28 10:05:35 UTC, 0 replies.
- "recommendProductsForUsers" makes worker node crash - posted by wanbo <ge...@163.com> on 2015/09/28 10:21:39 UTC, 0 replies.
- laziness in textFile reading from HDFS? - posted by davidkl <da...@hotmail.com> on 2015/09/28 10:39:40 UTC, 1 replies.
- Re: using multiple dstreams together (spark streaming) - posted by Archit Thakur <ar...@gmail.com> on 2015/09/28 11:11:37 UTC, 0 replies.
- Lower Consistency level : Retry - posted by Samya <sa...@amadeus.com> on 2015/09/28 11:32:25 UTC, 0 replies.
- log4j Spark-worker performance problem - posted by vaibhavrtk <va...@gmail.com> on 2015/09/28 12:27:41 UTC, 1 replies.
- CassandraSQLContext throwing NullPointer Exception - posted by Priya Ch <le...@gmail.com> on 2015/09/28 15:21:21 UTC, 3 replies.
- Update cassandra rows problem - posted by amine_901 <ch...@gmail.com> on 2015/09/28 15:59:27 UTC, 1 replies.
- Spark Streaming Log4j Inside Eclipse - posted by Ashish Soni <as...@gmail.com> on 2015/09/28 16:18:37 UTC, 6 replies.
- Spark REST Job server feedback? - posted by Ramirez Quetzal <ra...@gmail.com> on 2015/09/28 17:32:24 UTC, 0 replies.
- Interactively search Parquet-stored data using Spark Streaming and DataFrames - posted by Նարեկ Գալստեան <ng...@gmail.com> on 2015/09/28 17:45:13 UTC, 0 replies.
- Spark streaming job filling a lot of data in local spark nodes - posted by swetha <sw...@gmail.com> on 2015/09/28 19:04:39 UTC, 1 replies.
- Re: About memory leak in spark 1.4.1 - posted by Jon Chase <jo...@gmail.com> on 2015/09/28 19:34:57 UTC, 0 replies.
- Adding / Removing worker nodes for Spark Streaming - posted by Augustus Hong <au...@branchmetrics.io> on 2015/09/28 20:27:02 UTC, 5 replies.
- Re: Python script runs fine in local mode, errors in other modes - posted by Aaron <aa...@target.com> on 2015/09/28 20:37:10 UTC, 0 replies.
- SQL queries in Spark / YARN - posted by Robert Grandl <rg...@yahoo.com.INVALID> on 2015/09/28 21:46:29 UTC, 3 replies.
- UnknownHostException with Mesos and custom Jar - posted by Stephen Hankinson <st...@affinio.com> on 2015/09/28 22:15:50 UTC, 2 replies.
- Performance when iterating over many parquet files - posted by jwthomas <jo...@accenture.com> on 2015/09/28 22:35:27 UTC, 13 replies.
- java.lang.ClassCastException (org.apache.spark.scheduler.ResultTask cannot be cast to org.apache.spark.scheduler.Task) - posted by amitra123 <am...@hotmail.com> on 2015/09/28 22:45:10 UTC, 2 replies.
- Reading kafka stream and writing to hdfs - posted by Chengi Liu <ch...@gmail.com> on 2015/09/28 22:45:57 UTC, 1 replies.
- Does YARN start new executor in place of the failed one? - posted by Alexander Pivovarov <ap...@gmail.com> on 2015/09/29 00:38:49 UTC, 1 replies.
- Get variable into Spark's foreachRDD function - posted by markluk <ma...@juicero.com> on 2015/09/29 01:06:51 UTC, 1 replies.
- nested collection object query - posted by tridib <tr...@live.com> on 2015/09/29 01:37:23 UTC, 4 replies.
- Monitoring tools for spark streaming - posted by Siva <sb...@gmail.com> on 2015/09/29 01:52:19 UTC, 5 replies.
- Merging two avro RDD/DataFrames - posted by TEST ONE <su...@cksworks.com> on 2015/09/29 02:00:24 UTC, 1 replies.
- Setting executors per worker - Standalone - posted by James Pirz <ja...@gmail.com> on 2015/09/29 02:24:38 UTC, 4 replies.
- spark-submit classloader issue... - posted by Rachana Srivastava <Ra...@markmonitor.com> on 2015/09/29 04:01:31 UTC, 1 replies.
- Is MLBase dead? - posted by Justin Pihony <ju...@gmail.com> on 2015/09/29 04:19:06 UTC, 1 replies.
- Is there any tool provides per-task monitoring to figure out task skew in Spark streaming? - posted by 이기석 <ks...@gmail.com> on 2015/09/29 04:59:06 UTC, 1 replies.
- SparkContext._active_spark_context returns None - posted by YiZhi Liu <ja...@gmail.com> on 2015/09/29 05:08:25 UTC, 4 replies.
- Spark SQL: Implementing Custom Data Source - posted by Jerry Lam <ch...@gmail.com> on 2015/09/29 05:22:57 UTC, 4 replies.
- A non-canonical use of the Spark computation model - posted by Blarvomere <an...@gmail.com> on 2015/09/29 06:52:38 UTC, 0 replies.
- flatmap() and spark performance - posted by jeff saremi <je...@hotmail.com> on 2015/09/29 07:21:07 UTC, 1 replies.
- Fwd: "Method json([class java.util.HashMap]) does not exist" when reading JSON on PySpark - posted by Fernando Paladini <fn...@gmail.com> on 2015/09/29 07:37:32 UTC, 2 replies.
- Where are logs for Spark Kafka Yarn on Cloudera - posted by Rachana Srivastava <Ra...@markmonitor.com> on 2015/09/29 08:37:48 UTC, 1 replies.
- Sharding vs. Per-Timeframe Tables - posted by Jan Algermissen <al...@icloud.com> on 2015/09/29 09:01:06 UTC, 0 replies.
- OOM error in Spark worker - posted by varun sharma <va...@gmail.com> on 2015/09/29 09:05:53 UTC, 0 replies.
- Checkpointing not removing shuffle files from local disk - posted by ramibatal <ra...@gmail.com> on 2015/09/29 10:18:33 UTC, 0 replies.
- Re: How to find how much data will be train in mllib or how much the spark job is completed ? - posted by Robineast <Ro...@xense.co.uk> on 2015/09/29 12:47:08 UTC, 1 replies.
- Change Orc split size - posted by Renu Yadav <yr...@gmail.com> on 2015/09/29 13:10:35 UTC, 0 replies.
- Fetching Date value from spark.sql.row in Spark 1.2.2 - posted by satish chandra j <js...@gmail.com> on 2015/09/29 13:11:35 UTC, 1 replies.
- Spark Streaming many subscriptions vs many jobs - posted by Arttii <a....@reply.de> on 2015/09/29 13:42:52 UTC, 1 replies.
- PySpark Checkpoints with Broadcast Variables - posted by Jason White <ja...@shopify.com> on 2015/09/29 14:14:14 UTC, 1 replies.
- Re: Spark-Kafka Connector issue - posted by Cody Koeninger <co...@koeninger.org> on 2015/09/29 14:19:45 UTC, 0 replies.
- Hive alter table is failing - posted by Ophir Cohen <op...@gmail.com> on 2015/09/29 14:20:37 UTC, 2 replies.
- Kafka error "partitions don't have a leader" / LeaderNotAvailableException - posted by Dmitry Goldenberg <dg...@gmail.com> on 2015/09/29 14:26:18 UTC, 7 replies.
- Cant perform full outer join - posted by Sa...@wellsfargo.com on 2015/09/29 15:56:50 UTC, 1 replies.
- Converting a DStream to schemaRDD - posted by Daniel Haviv <da...@veracity-group.com> on 2015/09/29 16:02:38 UTC, 2 replies.
- RandomForestClassifer does not recognize number of classes, nor can number of classes be set - posted by Kristina Rogale Plazonic <kp...@gmail.com> on 2015/09/29 16:14:45 UTC, 1 replies.
- "Method json([class java.util.HashMap]) does not exist" when reading JSON - posted by Fernando Paladini <fn...@gmail.com> on 2015/09/29 16:23:05 UTC, 3 replies.
- Spark Job/Stage names - posted by Nithin Asokan <an...@gmail.com> on 2015/09/29 16:40:07 UTC, 1 replies.
- DStream union with different slideDuration - posted by "Goodall, Mark (UK)" <ma...@baesystems.com> on 2015/09/29 17:41:18 UTC, 0 replies.
- Executor Lost Failure - posted by Anup Sawant <an...@gmail.com> on 2015/09/29 18:02:29 UTC, 2 replies.
- input file from tar.gz - posted by Peter Rudenko <pe...@gmail.com> on 2015/09/29 20:39:03 UTC, 1 replies.
- Spark mailing list confusion - posted by Robineast <Ro...@xense.co.uk> on 2015/09/29 20:44:16 UTC, 1 replies.
- Best practices to call small spark jobs as part of REST api - posted by unk1102 <um...@gmail.com> on 2015/09/29 20:56:15 UTC, 0 replies.
- Dynamic DAG use-case for spark streaming. - posted by Archit Thakur <ar...@gmail.com> on 2015/09/29 21:06:26 UTC, 1 replies.
- Spark SQL deprecating Hive? How will I access Hive metadata in the future? - posted by YaoPau <jo...@gmail.com> on 2015/09/29 21:24:36 UTC, 0 replies.
- How to set System environment variables in Spark - posted by swetha <sw...@gmail.com> on 2015/09/29 21:29:59 UTC, 0 replies.
- Re: How to set System environment variables in Spark - posted by Ted Yu <yu...@gmail.com> on 2015/09/29 21:32:35 UTC, 1 replies.
- spark distributed linear system with sparse data - posted by Cameron McBride <ca...@gmail.com> on 2015/09/29 21:58:04 UTC, 0 replies.
- Re: Spark SQL deprecating Hive? How will I access Hive metadata in the future? - posted by Michael Armbrust <mi...@databricks.com> on 2015/09/29 22:24:17 UTC, 0 replies.
- ThrowableSerializationWrapper: Task exception could not be deserialized / ClassNotFoundException: org.apache.solr.common.SolrException - posted by Dmitry Goldenberg <dg...@gmail.com> on 2015/09/29 22:37:16 UTC, 8 replies.
- Hive ORC Malformed while loading into spark data frame - posted by unk1102 <um...@gmail.com> on 2015/09/29 22:46:47 UTC, 5 replies.
- Does pyspark in cluster mode need python on individual executor nodes ? - posted by Ranjana Rajendran <ra...@gmail.com> on 2015/09/29 23:08:55 UTC, 2 replies.
- Re: pyspark-Failed to run first - posted by balajikvijayan <ba...@gmail.com> on 2015/09/30 00:53:14 UTC, 0 replies.
- using UDF( defined in Java) in scala through scala - posted by ogoh <ok...@gmail.com> on 2015/09/30 01:06:03 UTC, 0 replies.
- Yahoo's Caffe-on-Spark project - posted by Thomas Dudziak <to...@gmail.com> on 2015/09/30 01:13:28 UTC, 0 replies.
- unintended consequence of using coalesce operation - posted by Lan Jiang <lj...@gmail.com> on 2015/09/30 01:33:44 UTC, 1 replies.
- Fwd: Query about checkpointing time - posted by Jatin Ganhotra <ja...@gmail.com> on 2015/09/30 02:09:11 UTC, 1 replies.
- Spark Streaming Standalone 1.5 - Stage cancelled because SparkContext was shut down - posted by An Tran <tr...@gmail.com> on 2015/09/30 02:14:13 UTC, 1 replies.
- Self Join reading the HDFS blocks TWICE - posted by Data Science Education <da...@gmail.com> on 2015/09/30 02:21:48 UTC, 1 replies.
- Spark thrift service and Hive impersonation. - posted by Jagat Singh <ja...@gmail.com> on 2015/09/30 02:30:19 UTC, 5 replies.
- Fetching Date value from RDD of type spark.sql.row - posted by satish chandra j <js...@gmail.com> on 2015/09/30 08:00:04 UTC, 0 replies.
- Submitting with --deploy-mode cluster: uploading the jar - posted by Christophe Schmitz <co...@gmail.com> on 2015/09/30 09:13:35 UTC, 1 replies.
- Need for advice - performance improvement and out of memory resolution - posted by Camelia Elena Ciolac <ca...@chalmers.se> on 2015/09/30 09:58:55 UTC, 3 replies.
- Partition Column in JDBCRDD or Datasource API - posted by satish chandra j <js...@gmail.com> on 2015/09/30 10:40:35 UTC, 0 replies.
- [streaming] reading Kafka direct stream throws kafka.common.OffsetOutOfRangeException - posted by Alexey Ponkin <al...@ya.ru> on 2015/09/30 11:26:04 UTC, 1 replies.
- Combine key-value pair in spark java - posted by Ramkumar V <ra...@gmail.com> on 2015/09/30 11:34:20 UTC, 2 replies.
- Spark Streaming - posted by Amith sha <am...@gmail.com> on 2015/09/30 13:50:51 UTC, 0 replies.
- sc.parallelize with defaultParallelism=1 - posted by Nicolae Marasoiu <ni...@adswizz.com> on 2015/09/30 13:52:31 UTC, 3 replies.
- partition recomputation in big lineage RDDs - posted by Nicolae Marasoiu <ni...@adswizz.com> on 2015/09/30 14:05:16 UTC, 1 replies.
- How to tell Spark not to use /tmp for snappy-unknown-***-libsnappyjava.so - posted by Dmitry Goldenberg <dg...@gmail.com> on 2015/09/30 14:54:47 UTC, 2 replies.
- New spark meetup - posted by Yogesh Mahajan <ma...@gmail.com> on 2015/09/30 18:47:50 UTC, 0 replies.
- Re: [cache eviction] partition recomputation in big lineage RDDs - posted by Nicolae Marasoiu <ni...@adswizz.com> on 2015/09/30 19:29:40 UTC, 0 replies.
- Metadata in Parquet - posted by Philip Weaver <ph...@gmail.com> on 2015/09/30 19:54:23 UTC, 1 replies.
- Re: Hive permanent functions are not available in Spark SQL - posted by Pala M Muthaia <mc...@rocketfuelinc.com.INVALID> on 2015/09/30 20:01:13 UTC, 0 replies.
- What is the best way to submit multiple tasks? - posted by Sa...@wellsfargo.com on 2015/09/30 22:57:05 UTC, 0 replies.
- Lost leader exception in Kafka Direct for Streaming - posted by swetha <sw...@gmail.com> on 2015/09/30 23:31:41 UTC, 0 replies.
- Problem understanding spark word count execution - posted by Kartik Mathur <ka...@bluedata.com> on 2015/09/30 23:42:16 UTC, 0 replies.