user@spark.apache.org, 2016-11

You are viewing a plain text version of this content. The canonical link for it is here.

- Re: why spark driver program is creating so many threads? How can I limit this number? - posted by kant kodali <ka...@gmail.com> on 2016/11/01 02:10:07 UTC, 9 replies.
- RE: Out Of Memory issue - posted by Kürşat Kurt <ku...@kursatkurt.com> on 2016/11/01 04:20:53 UTC, 4 replies.
- Re: java.lang.OutOfMemoryError: unable to create new native thread - posted by kant kodali <ka...@gmail.com> on 2016/11/01 05:32:48 UTC, 1 replies.
- Spark Job Failed with FileNotFoundException - posted by fanooos <de...@gmail.com> on 2016/11/01 07:16:44 UTC, 0 replies.
- Addition of two SparseVector - posted by "颜发才 (Yan Facai)" <ya...@gmail.com> on 2016/11/01 08:04:26 UTC, 0 replies.
- Python - Spark Cassandra Connector on DC/OS - posted by Andrew Holway <an...@otternetworks.de> on 2016/11/01 09:04:05 UTC, 1 replies.
- Streaming performance would be better with input than without - posted by wyj <wy...@meitu.com> on 2016/11/01 09:32:48 UTC, 0 replies.
- Re: Efficient filtering on Spark SQL dataframes with ordered keys - posted by Michael David Pedersen <mi...@googlemail.com> on 2016/11/01 10:01:43 UTC, 5 replies.
- Is IDF model reusable - posted by Nirav Patel <np...@xactlycorp.com> on 2016/11/01 10:10:25 UTC, 0 replies.
- Spark ML - Is IDF model reusable - posted by Nirav Patel <np...@xactlycorp.com> on 2016/11/01 10:15:10 UTC, 10 replies.
- Spark ML - CrossValidation - How to get Evaluation metrics of best model - posted by Nirav Patel <np...@xactlycorp.com> on 2016/11/01 12:10:15 UTC, 2 replies.
- Add jar files on classpath when submitting tasks to Spark - posted by Jan Botorek <Ja...@infor.com> on 2016/11/01 12:11:45 UTC, 10 replies.
- Application remains in WAITING state after Master election - posted by Alexis Seigneurin <as...@ipponusa.com> on 2016/11/01 15:46:24 UTC, 0 replies.
- Re: GraphFrame BFS - posted by Denny Lee <de...@gmail.com> on 2016/11/01 15:50:50 UTC, 0 replies.
- Re: Deep learning libraries for scala - posted by Benjamin Kim <bb...@gmail.com> on 2016/11/01 17:14:40 UTC, 1 replies.
- not table to connect to table using hiveContext - posted by vinay parekar <vp...@overstock.com> on 2016/11/01 23:30:25 UTC, 0 replies.
- Does Data pipeline using kafka and structured streaming work? - posted by shyla deshpande <de...@gmail.com> on 2016/11/01 23:45:57 UTC, 6 replies.
- Spark Streaming backpressure weird behavior/bug - posted by map reduced <k3...@gmail.com> on 2016/11/02 04:59:07 UTC, 12 replies.
- How to return a case class in map function? - posted by "颜发才 (Yan Facai)" <ya...@gmail.com> on 2016/11/02 07:01:35 UTC, 3 replies.
- How to avoid unnecessary spark starkups on every request? - posted by Fanjin Zeng <fj...@yahoo.com.INVALID> on 2016/11/02 07:34:24 UTC, 2 replies.
- random idea - posted by kant kodali <ka...@gmail.com> on 2016/11/02 08:37:00 UTC, 0 replies.
- Re: Duplicate rows in windowing functions - posted by Pankaj Wahane <pa...@live.com> on 2016/11/02 08:43:16 UTC, 1 replies.
- [Spark2] huge BloomFilters - posted by ponkin <al...@ya.ru> on 2016/11/02 10:27:26 UTC, 0 replies.
- unsubscribe - posted by Kunal Gaikwad <ar...@gmail.com> on 2016/11/02 10:47:27 UTC, 5 replies.
- Need to know about GraphX and Streaming - posted by "Md. Mahedi Kaysar" <md...@gmail.com> on 2016/11/02 11:20:09 UTC, 0 replies.
- Running Google Dataflow on Spark - posted by Ashutosh Kumar <km...@gmail.com> on 2016/11/02 11:48:09 UTC, 1 replies.
- Big Data Event London, 3-4th November 2016 from Tomorrow - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/11/02 12:25:14 UTC, 0 replies.
- Unsubscribe - posted by srikrishna chaitanya garimella <sr...@gmail.com> on 2016/11/02 13:57:19 UTC, 8 replies.
- Use a specific partition of dataframe - posted by Yanwei Zhang <ac...@hotmail.com> on 2016/11/02 16:28:46 UTC, 1 replies.
- Load whole ALS MatrixFactorizationModel into memory - posted by Mikael Ståldal <mi...@magine.com> on 2016/11/02 16:53:57 UTC, 1 replies.
- error: Unable to find encoder for type stored in a Dataset. when trying to map through a DataFrame - posted by Daniel Haviv <da...@veracity-group.com> on 2016/11/02 16:57:38 UTC, 1 replies.
- Custom receiver for WebSocket in Spark not working - posted by Cassa L <lc...@gmail.com> on 2016/11/02 17:23:23 UTC, 1 replies.
- Spark ML - Is it rule of thumb that all Estimators should only be Fit on Training data - posted by Nirav Patel <np...@xactlycorp.com> on 2016/11/02 18:05:00 UTC, 1 replies.
- Quirk in how Spark DF handles JSON input records? - posted by Michael Segel <ms...@hotmail.com> on 2016/11/02 18:50:13 UTC, 5 replies.
- BiMap BroadCast Variable - Kryo Serialization Issue - posted by Kalpana Jalawadi <ka...@gmail.com> on 2016/11/02 19:05:47 UTC, 0 replies.
- RuntimeException: Null value appeared in non-nullable field when holding Optional Case Class - posted by Aniket Bhatnagar <an...@gmail.com> on 2016/11/02 21:12:25 UTC, 1 replies.
- Re: mapwithstate Hangs with Error cleaning broadcast - posted by manasdebashiskar <po...@gmail.com> on 2016/11/02 21:59:45 UTC, 0 replies.
- Creating external tables in Spark 2.0.0 - posted by Anton Bubna-Litic <An...@quantium.com.au> on 2016/11/03 04:39:51 UTC, 0 replies.
- Increasing Executor threadpool - posted by map reduced <k3...@gmail.com> on 2016/11/03 04:48:33 UTC, 3 replies.
- distribute partitions evenly to my cluster - posted by heather79 <ro...@gmail.com> on 2016/11/03 06:05:10 UTC, 1 replies.
- Insert a JavaPairDStream into multiple cassandra table on the basis of key. - posted by Abhishek Anand <ab...@gmail.com> on 2016/11/03 06:28:58 UTC, 0 replies.
- Is Spark launcher's listener API considered production ready? - posted by Aseem Bansal <as...@gmail.com> on 2016/11/03 07:22:22 UTC, 1 replies.
- How to join dstream and JDBCRDD with checkpointing enabled - posted by saurabh3d <sa...@oracle.com> on 2016/11/03 08:14:29 UTC, 0 replies.
- SparkSQL with Hive got "java.lang.NullPointerException" - posted by lxw <lx...@qq.com> on 2016/11/03 10:20:07 UTC, 0 replies.
- LinearRegressionWithSGD and Rank Features By Importance - posted by "Carlo.Allocca" <ca...@open.ac.uk> on 2016/11/03 10:35:33 UTC, 13 replies.
- Confusion SparkSQL DataFrame OrderBy followed by GroupBY - posted by Rabin Banerjee <de...@gmail.com> on 2016/11/03 11:53:28 UTC, 17 replies.
- Delegation Token renewal in yarn-cluster - posted by Zsolt Tóth <to...@gmail.com> on 2016/11/03 14:22:53 UTC, 9 replies.
- incomplete aggregation in a GROUP BY - posted by Donald Matthews <dr...@gmail.com> on 2016/11/03 15:05:32 UTC, 1 replies.
- How do I convert a data frame to broadcast variable? - posted by "Jain, Nishit" <nj...@underarmour.com> on 2016/11/03 15:53:10 UTC, 4 replies.
- PySpark 2: Kmeans The input data is not directly cached - posted by Zakaria Hili <za...@gmail.com> on 2016/11/03 16:16:21 UTC, 0 replies.
- Re: mLIb solving linear regression with sparse inputs - posted by Robineast <Ro...@xense.co.uk> on 2016/11/03 18:07:04 UTC, 4 replies.
- example LDA code ClassCastException - posted by jamborta <ja...@gmail.com> on 2016/11/03 18:20:14 UTC, 2 replies.
- Aggregation Calculation - posted by Andrés Ivaldi <ia...@gmail.com> on 2016/11/03 18:29:36 UTC, 3 replies.
- Fwd: Stream compressed data from KafkaUtils.createDirectStream - posted by baki hayat <ba...@gmail.com> on 2016/11/03 18:46:50 UTC, 3 replies.
- How do I specify StorageLevel in KafkaUtils.createDirectStream? - posted by kant kodali <ka...@gmail.com> on 2016/11/03 20:05:08 UTC, 0 replies.
- Slow Parquet write to HDFS using Spark - posted by morfious902002 <an...@gmail.com> on 2016/11/03 20:52:28 UTC, 0 replies.
- Spark XML ignore namespaces - posted by Arun Patel <ar...@gmail.com> on 2016/11/03 21:37:42 UTC, 1 replies.
- Use BLAS object for matrix operation - posted by Yanwei Zhang <ac...@hotmail.com> on 2016/11/03 23:04:11 UTC, 1 replies.
- Error creating SparkSession, in IntelliJ - posted by shyla deshpande <de...@gmail.com> on 2016/11/04 00:10:30 UTC, 2 replies.
- expected behavior of Kafka dynamic topic subscription - posted by Haopu Wang <HW...@qilinsoft.com> on 2016/11/04 02:43:40 UTC, 3 replies.
- sanboxing spark executors - posted by blazespinnaker <bl...@gmail.com> on 2016/11/04 06:41:19 UTC, 6 replies.
- Vector is not found in case class after import - posted by "颜发才 (Yan Facai)" <ya...@gmail.com> on 2016/11/04 08:43:02 UTC, 0 replies.
- InvalidClassException when load KafkaDirectStream from checkpoint (Spark 2.0.0) - posted by Haopu Wang <HW...@qilinsoft.com> on 2016/11/04 09:23:00 UTC, 1 replies.
- WARN 1 block locks were not released with MLlib ALS - posted by Mikael Ståldal <mi...@magine.com> on 2016/11/04 13:09:46 UTC, 0 replies.
- Instability issues with Spark 2.0.1 and Kafka 0.10 - posted by vonnagy <iv...@vadio.com> on 2016/11/04 17:20:04 UTC, 18 replies.
- SAXParseException while writing to parquet on s3 - posted by lminer <lm...@hotmail.com> on 2016/11/04 17:53:39 UTC, 0 replies.
- Clustering Webpages using KMean and Spark Apis : GC limit exceed. - posted by Reth RM <re...@gmail.com> on 2016/11/04 18:13:34 UTC, 1 replies.
- GenericRowWithSchema cannot be cast to java.lang.Double : UDAF error - posted by "Manjunath, Kiran" <ki...@akamai.com> on 2016/11/04 20:46:29 UTC, 1 replies.
- Spark Float to VectorUDT for ML evaluator lib - posted by Manish Tripathi <tr...@gmail.com> on 2016/11/04 20:51:02 UTC, 0 replies.
- NoSuchElementException when trying to use dataset - posted by levt <le...@numerify.com> on 2016/11/04 20:56:57 UTC, 0 replies.
- java.lang.ClassNotFoundException: org.apache.spark.sql.SparkSession$ . Please Help!!!!!!! - posted by shyla deshpande <de...@gmail.com> on 2016/11/04 21:00:56 UTC, 2 replies.
- java.util.NoSuchElementException when trying to use dataset from worker - posted by levt <le...@numerify.com> on 2016/11/04 22:27:05 UTC, 0 replies.
- Upgrading to Spark 2.0.1 broke array in parquet DataFrame - posted by Sam Goodwin <sa...@gmail.com> on 2016/11/05 00:11:40 UTC, 1 replies.
- NoSuchElementException - posted by Lev Tsentsiper <le...@numerify.com> on 2016/11/05 00:15:48 UTC, 1 replies.
- SparkLauncer 2.0.1 version working incosistently in yarn-client mode - posted by Elkhan Dadashov <el...@gmail.com> on 2016/11/05 09:54:51 UTC, 2 replies.
- why visitCreateFileFormat doesn`t support hive STORED BY ,just support store as - posted by 母延年（YDB技术支持） <18...@qq.com> on 2016/11/05 12:27:55 UTC, 0 replies.
- Optimized way to use spark as db to hdfs etl - posted by Rohit Verma <ro...@rokittech.com> on 2016/11/05 14:39:12 UTC, 2 replies.
- Reading csv files with quoted fields containing embedded commas - posted by Femi Anthony <fe...@gmail.com> on 2016/11/05 21:58:46 UTC, 2 replies.
- Spark dataset cache vs tempview - posted by Rohit Verma <ro...@rokittech.com> on 2016/11/06 03:44:36 UTC, 1 replies.
- mapWithState and DataFrames - posted by Daniel Haviv <da...@veracity-group.com> on 2016/11/06 10:53:31 UTC, 1 replies.
- Fwd: A Spark long running program as web server - posted by Reza zade <kn...@gmail.com> on 2016/11/06 13:06:23 UTC, 2 replies.
- Improvement proposal | Dynamic disk allocation - posted by Aniket Bhatnagar <an...@gmail.com> on 2016/11/06 13:06:55 UTC, 1 replies.
- Very long pause/hang at end of execution - posted by Michael Johnson <mj...@yahoo.com.INVALID> on 2016/11/06 13:28:13 UTC, 10 replies.
- Newbie question - Best way to bootstrap with Spark - posted by raghav <ra...@gmail.com> on 2016/11/07 00:57:01 UTC, 8 replies.
- Structured Streaming with Kafka source,, does it work?????? - posted by shyla deshpande <de...@gmail.com> on 2016/11/07 01:15:17 UTC, 0 replies.
- Re: Structured Streaming with Kafka Source, does it work?? - posted by shyla deshpande <de...@gmail.com> on 2016/11/07 01:25:58 UTC, 1 replies.
- hope someone can recommend some books for me,a spark beginner - posted by litg <19...@qq.com> on 2016/11/07 03:00:16 UTC, 2 replies.
- 回复：Structured Streaming with Kafka source,, does it work?????? - posted by "余根茂(木艮)" <ge...@alibaba-inc.com> on 2016/11/07 03:08:43 UTC, 0 replies.
- Error while creating tables in Parquet format in 2.0.1 (No plan for InsertIntoTable) - posted by Kiran Chitturi <ki...@lucidworks.com> on 2016/11/07 04:08:02 UTC, 1 replies.
- Spark-packages - posted by Stephen Boesch <ja...@gmail.com> on 2016/11/07 04:18:22 UTC, 1 replies.
- Re: spark streaming with kinesis - posted by Shushant Arora <sh...@gmail.com> on 2016/11/07 04:36:48 UTC, 9 replies.
- Out of memory at 60GB free memory. - posted by Kürşat Kurt <ku...@kursatkurt.com> on 2016/11/07 05:32:28 UTC, 6 replies.
- Spark Exits with exception - posted by Shivansh Srivastava <sh...@knoldus.com> on 2016/11/07 07:15:08 UTC, 1 replies.
- Re: Already subscribed to user@spark.apache.org - posted by Maitray Thaker <ma...@gmail.com> on 2016/11/07 07:57:54 UTC, 0 replies.
- spark optimization - posted by maitraythaker <ma...@gmail.com> on 2016/11/07 08:07:03 UTC, 0 replies.
- mapWithState with a big initial RDD gets OOM'ed - posted by Daniel Haviv <da...@veracity-group.com> on 2016/11/07 08:30:46 UTC, 0 replies.
- VectorUDT and ml.Vector - posted by Ganesh <ma...@ganeshkrishnan.com> on 2016/11/07 13:25:37 UTC, 1 replies.
- Spark master shows 0 cores for executors - posted by Rohit Verma <ro...@rokittech.com> on 2016/11/07 14:26:29 UTC, 0 replies.
- Spark with Ranger - posted by Mudit Kumar <mk...@sapient.com> on 2016/11/07 15:23:08 UTC, 1 replies.
- How sensitive is Spark to Swap? - posted by Michael Segel <ms...@hotmail.com> on 2016/11/07 17:28:59 UTC, 0 replies.
- Re: How sensitive is Spark to Swap? - posted by Sean Owen <so...@cloudera.com> on 2016/11/07 17:45:08 UTC, 0 replies.
- Access_Remote_Kerberized_Cluster_Through_Spark - posted by Ajay Chander <it...@gmail.com> on 2016/11/07 21:37:47 UTC, 3 replies.
- Spark Streaming Data loss on failure to write BlockAdditionEvent failure to WAL - posted by Arijit <Ar...@live.com> on 2016/11/07 22:04:41 UTC, 0 replies.
- Anomalous Spark RDD persistence behavior - posted by Dave Jaffe <dj...@vmware.com> on 2016/11/07 22:07:25 UTC, 2 replies.
- Using Apache Spark Streaming - how to handle changing data format within stream - posted by coolgar <ka...@gmail.com> on 2016/11/07 22:22:49 UTC, 2 replies.
- Does DeserializeToObject mean that a Row is deserialized to Java objects? - posted by Benyi Wang <be...@gmail.com> on 2016/11/07 22:24:29 UTC, 0 replies.
- Spark ML - Naive Bayes - how to select Threshold values - posted by Nirav Patel <np...@xactlycorp.com> on 2016/11/07 23:07:51 UTC, 0 replies.
- Correct SparkLauncher usage - posted by Mohammad Tariq <do...@gmail.com> on 2016/11/07 23:29:29 UTC, 10 replies.
- Structured Streaming with Cassandra, Is it supported?? - posted by shyla deshpande <de...@gmail.com> on 2016/11/07 23:33:21 UTC, 2 replies.
- VectorUDT and ml.Vector for SVD - posted by ganeshkrishnan <ma...@ganeshkrishnan.com> on 2016/11/07 23:51:21 UTC, 0 replies.
- Re: Spark Streaming Data loss on failure to write BlockAdditionEvent failure to WAL - posted by Tathagata Das <ta...@gmail.com> on 2016/11/08 02:59:06 UTC, 4 replies.
- TallSkinnyQR - posted by im281 <im...@gmail.com> on 2016/11/08 03:50:05 UTC, 7 replies.
- [ANNOUNCE] Announcing Apache Spark 1.6.3 - posted by Reynold Xin <rx...@databricks.com> on 2016/11/08 06:07:40 UTC, 0 replies.
- Why active tasks is bigger than cores? - posted by 涂小刚 <sh...@gmail.com> on 2016/11/08 07:14:56 UTC, 0 replies.
- Kafka stream offset management question - posted by Haopu Wang <HW...@qilinsoft.com> on 2016/11/08 08:21:42 UTC, 1 replies.
- DataSet toJson - posted by Andrés Ivaldi <ia...@gmail.com> on 2016/11/08 13:06:29 UTC, 1 replies.
- how to write a substring search efficiently? - posted by Haig Didizian <ha...@didizian.com> on 2016/11/08 13:36:07 UTC, 0 replies.
- Spark streaming uses lesser number of executors - posted by Aravindh <ma...@aravindh.io> on 2016/11/08 13:59:48 UTC, 0 replies.
- Live data visualisations with Spark - posted by Andrew Holway <an...@otternetworks.de> on 2016/11/08 16:13:51 UTC, 2 replies.
- use case reading files split per id - posted by ruben <ru...@pandora.be> on 2016/11/08 17:11:45 UTC, 3 replies.
- read large number of files on s3 - posted by Xiaomeng Wan <sh...@gmail.com> on 2016/11/08 17:31:05 UTC, 0 replies.
- mapWithState with Datasets - posted by Daniel Haviv <da...@veracity-group.com> on 2016/11/08 17:46:08 UTC, 2 replies.
- Running - posted by rurbanow <ru...@gmail.com> on 2016/11/08 18:40:11 UTC, 0 replies.
- Re: GraphX Connected Components - posted by Robineast <Ro...@xense.co.uk> on 2016/11/08 20:06:04 UTC, 0 replies.
- GraphX and Public Transport Shortest Paths - posted by Gerard Casey <ge...@me.com> on 2016/11/08 20:12:03 UTC, 0 replies.
- Convert RDD of numpy matrices to Dataframes - posted by aditya1702 <ad...@gmail.com> on 2016/11/08 20:37:15 UTC, 0 replies.
- How Spark determines Parquet partition size - posted by Selvam Raman <se...@gmail.com> on 2016/11/08 21:40:43 UTC, 0 replies.
- Save a spark RDD to disk - posted by Elf Of Lothlorein <re...@gmail.com> on 2016/11/08 22:08:14 UTC, 2 replies.
- spark ml - ngram - how to preserve single word (1-gram) - posted by Nirav Patel <np...@xactlycorp.com> on 2016/11/08 22:41:03 UTC, 0 replies.
- Re: Any Dynamic Compilation of Scala Query - posted by Mahender Sarangam <Ma...@outlook.com> on 2016/11/08 23:22:23 UTC, 0 replies.
- Splines or Smoothing Kernels for Linear Regression - posted by Tobi Bosede <an...@gmail.com> on 2016/11/09 01:14:01 UTC, 0 replies.
- Strongly Connected Components - posted by Shreya Agarwal <sh...@microsoft.com> on 2016/11/09 05:04:59 UTC, 9 replies.
- Spark streaming delays spikes - posted by "Shlomi.b" <sh...@gigya-inc.com> on 2016/11/09 08:48:55 UTC, 0 replies.
- installing spark-jobserver on cdh 5.7 and yarn - posted by Reza zade <kn...@gmail.com> on 2016/11/09 09:59:24 UTC, 1 replies.
- Application config management - posted by Erwan ALLAIN <ea...@gmail.com> on 2016/11/09 11:06:15 UTC, 0 replies.
- Physical plan for windows and joins - how to know which is faster? - posted by Jacek Laskowski <ja...@japila.pl> on 2016/11/09 12:36:47 UTC, 1 replies.
- javac - No such file or directory - posted by Andrew Holway <an...@otternetworks.de> on 2016/11/09 13:43:42 UTC, 1 replies.
- importing data into hdfs/spark using Informatica ETL tool - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/11/09 13:56:37 UTC, 6 replies.
- How to impersonate a user from a Spark program - posted by Samy Dindane <sa...@dindane.com> on 2016/11/09 15:20:36 UTC, 1 replies.
- Aggregations on every column on dataframe causing StackOverflowError - posted by Raviteja Lokineni <ra...@gmail.com> on 2016/11/09 15:48:16 UTC, 4 replies.
- Issue Running sparkR on YARN - posted by Ia...@tdameritrade.com on 2016/11/09 20:11:18 UTC, 1 replies.
- How to interpret the Time Line on "Details for Stage" Spark UI page - posted by Xiaoye Sun <su...@gmail.com> on 2016/11/09 21:51:28 UTC, 0 replies.
- Hive Queries are running very slowly in Spark 2.0 - posted by Jaya Shankar Vadisela <jv...@innominds.com> on 2016/11/10 06:02:53 UTC, 1 replies.
- Unable to lauch Python Web Application on Spark Cluster - posted by anjali gautam <an...@gmail.com> on 2016/11/10 06:31:20 UTC, 2 replies.
- how to merge dataframe write output files - posted by lk_spark <lk...@163.com> on 2016/11/10 07:28:35 UTC, 5 replies.
- Akka Stream as the source for Spark Streaming. Please advice... - posted by shyla deshpande <de...@gmail.com> on 2016/11/10 07:46:25 UTC, 8 replies.
- Swift question regarding in-memory snapshots of compact table data - posted by Daniel Schulz <da...@hotmail.com> on 2016/11/10 07:50:19 UTC, 0 replies.
- Re: If we run sc.textfile(path,xxx) many times, will the elements be the same in each partition - posted by Prashant Sharma <sc...@gmail.com> on 2016/11/10 14:20:44 UTC, 0 replies.
- will spark aggregate and treeaggregate case a shuflle action? - posted by codlife <10...@qq.com> on 2016/11/10 15:45:12 UTC, 0 replies.
- Spark Streaming: question on sticky session across batches ? - posted by Manish Malhotra <ma...@gmail.com> on 2016/11/10 16:42:49 UTC, 3 replies.
- type-safe join in the new DataSet API? - posted by Yang <te...@gmail.com> on 2016/11/10 18:44:30 UTC, 2 replies.
- UDF with column value comparison fails with PySpark - posted by Perttu Ranta-aho <ra...@iki.fi> on 2016/11/10 19:14:02 UTC, 2 replies.
- Anyone using ProtoBuf for Kafka messages with Spark Streaming for processing? - posted by shyla deshpande <de...@gmail.com> on 2016/11/10 20:20:36 UTC, 0 replies.
- Joining to a large, pre-sorted file - posted by Stuart White <st...@gmail.com> on 2016/11/10 22:45:02 UTC, 9 replies.
- load large number of files from s3 - posted by Xiaomeng Wan <sh...@gmail.com> on 2016/11/11 13:08:10 UTC, 1 replies.
- Possible DR solution - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/11/11 14:56:08 UTC, 15 replies.
- How to use Spark SQL to connect to Cassandra from Spark-Shell? - posted by kant kodali <ka...@gmail.com> on 2016/11/11 16:04:47 UTC, 5 replies.
- Kafka Producer within a docker Instance - posted by Raghav <ra...@gmail.com> on 2016/11/11 16:19:51 UTC, 0 replies.
- Dataset API | Setting number of partitions during join/groupBy - posted by Aniket Bhatnagar <an...@gmail.com> on 2016/11/11 17:22:14 UTC, 2 replies.
- RDD to HDFS - Kerberos - authentication error - RetryInvocationHandler - posted by Gerard Casey <ge...@me.com> on 2016/11/11 17:48:15 UTC, 0 replies.
- DataSet is not able to handle 50,000 columns to sum - posted by Anil Langote <an...@gmail.com> on 2016/11/11 17:57:23 UTC, 0 replies.
- Finding a Spark Equivalent for Pandas' get_dummies - posted by Nicholas Sharkey <ni...@gmail.com> on 2016/11/11 18:27:21 UTC, 4 replies.
- appHandle.kill(), SparkSubmit Process, JVM questions related to SparkLauncher design and Spark Driver - posted by Elkhan Dadashov <el...@gmail.com> on 2016/11/11 22:49:20 UTC, 0 replies.
- Re: DataSet is not able to handle 50,000 columns to sum - posted by ayan guha <gu...@gmail.com> on 2016/11/12 00:10:20 UTC, 1 replies.
- pyspark: accept unicode column names in DataFrame.corr and cov - posted by SamPenrose <sp...@mozilla.com> on 2016/11/12 00:36:25 UTC, 1 replies.
- SparkDriver memory calculation mismatch - posted by Elkhan Dadashov <el...@gmail.com> on 2016/11/12 02:18:21 UTC, 4 replies.
- Exception not failing Python applications (in yarn client mode) - SparkLauncher says app succeeded, where app actually has failed - posted by Elkhan Dadashov <el...@gmail.com> on 2016/11/12 03:32:16 UTC, 1 replies.
- Spark joins using row id - posted by Rohit Verma <ro...@rokittech.com> on 2016/11/12 11:11:14 UTC, 2 replies.
- Spark Streaming- ReduceByKey not removing Duplicates for the same key in a Batch - posted by dev loper <sp...@gmail.com> on 2016/11/12 12:36:27 UTC, 5 replies.
- toDebugString is clipped - posted by Anirudh Perugu <an...@stonybrook.edu> on 2016/11/13 01:56:02 UTC, 1 replies.
- spark-shell not starting ( in a Kali linux 2 OS) - posted by Kelum Perera <ke...@gmail.com> on 2016/11/13 04:50:26 UTC, 4 replies.
- Re: Spark stalling during shuffle (maybe a memory issue) - posted by bogdanbaraila <bo...@gmail.com> on 2016/11/13 09:54:35 UTC, 0 replies.
- Nearest neighbour search - posted by Meeraj Kunnumpurath <me...@servicesymphony.com> on 2016/11/13 15:04:07 UTC, 4 replies.
- Spark SQL shell hangs - posted by rakesh sharma <ra...@hotmail.com> on 2016/11/13 15:20:41 UTC, 1 replies.
- [ANNOUNCE] Apache SystemML 0.11.0-incubating released - posted by Luciano Resende <lr...@apache.org> on 2016/11/13 18:28:41 UTC, 0 replies.
- receiver based spark streaming doubts - posted by Shushant Arora <sh...@gmail.com> on 2016/11/13 19:04:54 UTC, 0 replies.
- sbt shenanigans for a Spark-based project - posted by Marco Mistroni <mm...@gmail.com> on 2016/11/13 21:01:53 UTC, 4 replies.
- Re: Spark ML : One hot Encoding for multiple columns - posted by janardhan shetty <ja...@gmail.com> on 2016/11/14 00:55:11 UTC, 1 replies.
- Convert SparseVector column to Densevector column - posted by janardhan shetty <ja...@gmail.com> on 2016/11/14 04:20:58 UTC, 2 replies.
- handle data skew problem when calculating word count and word dependency - posted by "ruan.answer" <ru...@gmail.com> on 2016/11/14 07:26:28 UTC, 0 replies.
- AVRO File size when caching in-memory - posted by Prithish <pr...@gmail.com> on 2016/11/14 09:05:22 UTC, 8 replies.
- SparkSQL: intra-SparkSQL-application table registration - posted by Mohamed Nadjib Mami <ma...@iai.uni-bonn.de> on 2016/11/14 10:03:37 UTC, 0 replies.
- Two questions about running spark on mesos - posted by Yu Wei <yu...@hotmail.com> on 2016/11/14 11:10:34 UTC, 0 replies.
- Spark hash function - posted by Rohit Verma <ro...@rokittech.com> on 2016/11/14 11:21:00 UTC, 0 replies.
- Grouping Set - posted by Andrés Ivaldi <ia...@gmail.com> on 2016/11/14 15:20:39 UTC, 3 replies.
- Re: took more time to get data from spark dataset to driver program - posted by Rishikesh Teke <ri...@gmail.com> on 2016/11/14 16:27:57 UTC, 0 replies.
- Spark streaming data loss due to timeout in writing BlockAdditionEvent to WAL by the driver - posted by Arijit <Ar...@live.com> on 2016/11/14 18:24:48 UTC, 0 replies.
- scala.MatchError while doing BinaryClassificationMetrics - posted by Bhaarat Sharma <bh...@gmail.com> on 2016/11/14 18:30:37 UTC, 4 replies.
- Re: Two questions about running spark on mesos - posted by Michael Gummelt <mg...@mesosphere.io> on 2016/11/14 18:40:37 UTC, 0 replies.
- Pasting oddity with Spark 2.0 (scala) - posted by jggg777 <jo...@gmail.com> on 2016/11/14 20:09:03 UTC, 0 replies.
- Cannot find Native Library in "cluster" deploy-mode - posted by jtgenesis <jt...@gmail.com> on 2016/11/14 21:41:20 UTC, 3 replies.
- Spark SQL UDF - passing map as a UDF parameter - posted by Nirav Patel <np...@xactlycorp.com> on 2016/11/15 00:33:35 UTC, 2 replies.
- [ANNOUNCE] Apache Spark 2.0.2 - posted by Reynold Xin <rx...@databricks.com> on 2016/11/15 05:14:24 UTC, 0 replies.
- mapWithState job slows down & exceeds yarn's memory limits - posted by Daniel Haviv <da...@veracity-group.com> on 2016/11/15 06:44:48 UTC, 0 replies.
- HiveContext.getOrCreate not accessible - posted by Praseetha <pr...@gmail.com> on 2016/11/15 07:17:32 UTC, 1 replies.
- How to read a Multi Line json object via Spark - posted by Sree Eedupuganti <sr...@inndata.in> on 2016/11/15 07:20:59 UTC, 2 replies.
- Simple "state machine" functionality using Scala or Python - posted by Esa Heikkinen <es...@student.tut.fi> on 2016/11/15 09:43:34 UTC, 0 replies.
- Fwd: - posted by Anton Okolnychyi <an...@gmail.com> on 2016/11/15 11:26:59 UTC, 2 replies.
- Straming - Stop when there's no more data - posted by Ashic Mahtab <as...@live.com> on 2016/11/15 11:29:23 UTC, 0 replies.
- Spark R guidelines for non-spark functions and coxph (Cox Regression for Time-Dependent Covariates) - posted by pietrop <pi...@gmail.com> on 2016/11/15 11:56:27 UTC, 2 replies.
- Exclude certain data from Training Data - Mlib - posted by Bhaarat Sharma <bh...@gmail.com> on 2016/11/15 13:28:33 UTC, 0 replies.
- creating a javaRDD using newAPIHadoopFile and FixedLengthInputFormat - posted by David Robison <da...@psgglobal.net> on 2016/11/15 13:44:40 UTC, 0 replies.
- CSV to parquet preserving partitioning - posted by benoitdr <be...@nokia.com> on 2016/11/15 13:44:40 UTC, 7 replies.
- SQL analyzer breakdown - posted by Koert Kuipers <ko...@tresata.com> on 2016/11/15 15:45:00 UTC, 0 replies.
- Running stress tests on spark cluster to avoid wild-goose chase later - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/11/15 17:09:55 UTC, 1 replies.
- GraphX updating vertex property - posted by Saliya Ekanayake <es...@gmail.com> on 2016/11/15 17:24:26 UTC, 0 replies.
- Spark SQL and JDBC compatibility - posted by "herman.yu@teeupdata.com" <he...@teeupdata.com> on 2016/11/15 17:29:30 UTC, 0 replies.
- Spark ML DataFrame API - need cosine similarity, how to convert to RDD Vectors? - posted by Russell Jurney <ru...@gmail.com> on 2016/11/15 18:06:10 UTC, 3 replies.
- Log-loss for multiclass classification - posted by janardhan shetty <ja...@gmail.com> on 2016/11/15 19:15:51 UTC, 1 replies.
- Problem submitting a spark job using yarn-client as master - posted by David Robison <da...@psgglobal.net> on 2016/11/15 21:45:28 UTC, 2 replies.
- Access broadcast variable from within function passed to reduceByKey - posted by coolgar <ka...@gmail.com> on 2016/11/15 23:57:49 UTC, 0 replies.
- Spark-xml - OutOfMemoryError: Requested array size exceeds VM limit - posted by Arun Patel <ar...@gmail.com> on 2016/11/16 00:12:49 UTC, 4 replies.
- Re: Does the delegator map task of SparkLauncher need to stay alive until Spark job finishes ? - posted by Elkhan Dadashov <el...@gmail.com> on 2016/11/16 01:57:46 UTC, 2 replies.
- what is the optimized way to combine multiple dataframes into one dataframe ? - posted by "Devi P.V" <de...@gmail.com> on 2016/11/16 07:05:47 UTC, 2 replies.
- Writing parquet table using spark - posted by Vaibhav Sinha <ma...@gmail.com> on 2016/11/16 08:40:32 UTC, 1 replies.
- How do I convert json_encoded_blob_column into a data frame? (This may be a feature request) - posted by kant kodali <ka...@gmail.com> on 2016/11/16 09:44:39 UTC, 6 replies.
- Map and MapParitions with partition-local variable - posted by Zsolt Tóth <to...@gmail.com> on 2016/11/16 11:59:09 UTC, 2 replies.
- problem deploying spark-jobserver on CentOS - posted by Reza zade <kn...@gmail.com> on 2016/11/16 13:09:30 UTC, 0 replies.
- HttpFileServer behavior in 1.6.3 - posted by Kai Wang <de...@gmail.com> on 2016/11/16 14:39:33 UTC, 0 replies.
- Spark UI shows Jobs are processing, but the files are already written to S3 - posted by Kuchekar <ku...@gmail.com> on 2016/11/16 18:00:25 UTC, 1 replies.
- Need guidelines in Spark Streaming and Kafka integration - posted by "Karim, Md. Rezaul" <re...@insight-centre.org> on 2016/11/16 18:18:21 UTC, 3 replies.
- [SQL/Catalyst] Janino Generated Code Debugging - posted by Aleksander Eskilson <al...@gmail.com> on 2016/11/16 18:59:30 UTC, 1 replies.
- Spark 2.0.2 with Kafka source, Error please help! - posted by shyla deshpande <de...@gmail.com> on 2016/11/16 19:16:00 UTC, 7 replies.
- RE: submitting a spark job using yarn-client and getting NoClassDefFoundError: org/apache/spark/Logging - posted by David Robison <da...@psgglobal.net> on 2016/11/16 20:05:10 UTC, 0 replies.
- How to propagate R_LIBS to sparkr executors - posted by Rodrick Brown <ro...@orchard-app.com> on 2016/11/16 21:00:47 UTC, 1 replies.
- Any with S3 experience with Spark? Having ListBucket issues - posted by Edden Burrow <ed...@gmail.com> on 2016/11/16 22:34:10 UTC, 1 replies.
- SparkILoop doesn't run - posted by Mohit Jaggi <mo...@gmail.com> on 2016/11/16 22:47:20 UTC, 5 replies.
- Re: Kafka segmentation - posted by Cody Koeninger <co...@koeninger.org> on 2016/11/17 00:22:47 UTC, 9 replies.
- Configure spark.kryoserializer.buffer.max at runtime does not take effect - posted by bluishpenguin <bl...@gmail.com> on 2016/11/17 03:55:22 UTC, 2 replies.
- Best practice for preprocessing feature with DataFrame - posted by "颜发才 (Yan Facai)" <ya...@gmail.com> on 2016/11/17 04:08:26 UTC, 5 replies.
- How does predicate push down really help? - posted by kant kodali <ka...@gmail.com> on 2016/11/17 06:03:22 UTC, 6 replies.
- Nested UDFs - posted by Perttu Ranta-aho <ra...@iki.fi> on 2016/11/17 07:14:38 UTC, 3 replies.
- why is method predict protected in PredictionModel - posted by wobu <bu...@gmail.com> on 2016/11/17 09:24:35 UTC, 1 replies.
- Another Interesting Question on SPARK SQL - posted by kant kodali <ka...@gmail.com> on 2016/11/17 09:28:30 UTC, 0 replies.
- Join Query - posted by Aakash Basu <aa...@gmail.com> on 2016/11/17 09:47:17 UTC, 1 replies.
- HDPCD SPARK Certification Queries - posted by Aakash Basu <aa...@gmail.com> on 2016/11/17 10:11:51 UTC, 1 replies.
- Handling windows characters with Spark CSV on Linux - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/11/17 13:11:21 UTC, 4 replies.
- outlier detection using StreamingKMeans - posted by Debasish Ghosh <gh...@gmail.com> on 2016/11/17 14:03:10 UTC, 0 replies.
- ClassCastException when using SparkSQL Window function - posted by Isabelle Phan <nl...@gmail.com> on 2016/11/17 14:20:16 UTC, 0 replies.
- newAPIHadoopFile throws a JsonMappingException: Infinite recursion (StackOverflowError) error - posted by David Robison <da...@psgglobal.net> on 2016/11/17 15:11:56 UTC, 0 replies.
- Fwd: Spark Partitioning Strategy with Parquet - posted by titli batali <ti...@gmail.com> on 2016/11/17 15:25:39 UTC, 3 replies.
- Re: Spark SQL join and subquery - posted by neil90 <ne...@icloud.com> on 2016/11/17 16:26:12 UTC, 1 replies.
- Fill na with last value - posted by Georg Heiler <ge...@gmail.com> on 2016/11/17 16:36:34 UTC, 0 replies.
- Using mapWithState without a checkpoint - posted by Daniel Haviv <da...@veracity-group.com> on 2016/11/17 17:45:54 UTC, 0 replies.
- Fill nan with last (good) value - posted by geoHeil <ge...@gmail.com> on 2016/11/17 17:57:02 UTC, 0 replies.
- analysing ibm mq messages using spark streaming - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/11/17 18:34:42 UTC, 0 replies.
- How to load only the data of the last partition - posted by Samy Dindane <sa...@dindane.com> on 2016/11/17 19:05:38 UTC, 3 replies.
- replace some partitions when writing dataframe - posted by Koert Kuipers <ko...@tresata.com> on 2016/11/17 19:09:33 UTC, 0 replies.
- Spark AVRO S3 read not working for partitioned data - posted by "Jain, Nishit" <nj...@underarmour.com> on 2016/11/17 19:11:02 UTC, 1 replies.
- Spark 2.0.2, Structured Streaming with kafka source... Unable to parse the value to Object.. - posted by shyla deshpande <de...@gmail.com> on 2016/11/17 19:30:34 UTC, 2 replies.
- does column order matter in dataframe.repartition? - posted by Cesar <ce...@gmail.com> on 2016/11/17 19:41:17 UTC, 1 replies.
- Spark Submit --> Unable to reach cluster manager to request executors - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2016/11/17 20:15:52 UTC, 0 replies.
- kafka 0.10 with Spark 2.02 auto.offset.reset=earliest will only read from a single partition on a multi partition topic - posted by Hster Geguri <hs...@gmail.com> on 2016/11/17 23:58:10 UTC, 1 replies.
- Long-running job OOMs driver process - posted by Irina Truong <ir...@parsely.com> on 2016/11/18 01:51:18 UTC, 7 replies.
- GraphX Pregel not update vertex state properly, cause messages loss - posted by fuz_woo <fu...@qq.com> on 2016/11/18 03:47:14 UTC, 5 replies.
- Is selecting different datasets from same parquet file blocking. - posted by Rohit Verma <ro...@rokittech.com> on 2016/11/18 04:21:24 UTC, 0 replies.
- sort descending with multiple columns - posted by Sreekanth Jella <js...@gmail.com> on 2016/11/18 07:15:01 UTC, 3 replies.
- How do I flatten JSON blobs into a Data Frame using Spark/Spark SQL - posted by kant kodali <ka...@gmail.com> on 2016/11/18 07:42:29 UTC, 6 replies.
- Re: org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0 - posted by Phillip Henry <lo...@gmail.com> on 2016/11/18 08:20:10 UTC, 0 replies.
- Kafka direct approach,App UI shows wrong input rate - posted by Julian Keppel <ju...@gmail.com> on 2016/11/18 10:38:45 UTC, 2 replies.
- Sporadic ClassNotFoundException with Kryo - posted by chrism <ch...@cics.se> on 2016/11/18 14:09:09 UTC, 0 replies.
- DataFrame select non-existing column - posted by Kristoffer Sjögren <st...@gmail.com> on 2016/11/18 14:32:01 UTC, 8 replies.
- Issue in application deployment on spark cluster - posted by Anjali Gautam <an...@gmail.com> on 2016/11/18 15:16:24 UTC, 1 replies.
- Will spark cache table once even if I call read/cache on the same table multiple times - posted by Rabin Banerjee <de...@gmail.com> on 2016/11/18 15:36:14 UTC, 4 replies.
- Successful streaming with ibm/ mq to flume then to kafka and finally spark streaming - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/11/18 16:53:12 UTC, 0 replies.
- How to expose Spark-Shell in the production? - posted by kant kodali <ka...@gmail.com> on 2016/11/18 19:26:38 UTC, 1 replies.
- Run spark with hadoop snapshot - posted by lminer <lm...@hotmail.com> on 2016/11/18 19:31:28 UTC, 2 replies.
- Spark driver not reusing HConnection - posted by Mukesh Jha <me...@gmail.com> on 2016/11/18 22:37:00 UTC, 3 replies.
- java.lang.OutOfMemoryError: Java heap space - posted by Kürşat Kurt <ku...@kursatkurt.com> on 2016/11/19 00:47:44 UTC, 2 replies.
- Reading LZO files with Spark - posted by learning_spark <di...@gmail.com> on 2016/11/19 04:20:39 UTC, 1 replies.
- Stateful aggregations with Structured Streaming - posted by "Yuval.Itzchakov" <yu...@gmail.com> on 2016/11/19 13:46:48 UTC, 1 replies.
- using StreamingKMeans - posted by debasishg <gh...@gmail.com> on 2016/11/19 16:46:41 UTC, 7 replies.
- Usage of mllib api in ml - posted by janardhan shetty <ja...@gmail.com> on 2016/11/19 17:03:02 UTC, 4 replies.
- Mac vs cluster Re: kafka 0.10 with Spark 2.02 auto.offset.reset=earliest will only read from a single partition on a multi partition topic - posted by Hster Geguri <hs...@gmail.com> on 2016/11/19 17:12:39 UTC, 2 replies.
- covert local tsv file to orc file on distributed cloud storage(openstack). - posted by vr spark <vr...@gmail.com> on 2016/11/19 17:21:02 UTC, 3 replies.
- Logistic Regression Match Error - posted by Meeraj Kunnumpurath <me...@servicesymphony.com> on 2016/11/19 18:10:00 UTC, 3 replies.
- HPC with Spark? Simultaneous, parallel one to one mapping of partition to vcore - posted by Adam Smith <ad...@gmail.com> on 2016/11/20 00:44:50 UTC, 1 replies.
- Create a Column expression from a String - posted by Stuart White <st...@gmail.com> on 2016/11/20 02:12:30 UTC, 3 replies.
- Using Flume as Input Stream to Spark - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/11/20 10:12:58 UTC, 0 replies.
- Error in running twitter streaming job - posted by "Kappaganthu, Sivaram (ES)" <Si...@ADP.com> on 2016/11/20 11:09:53 UTC, 3 replies.
- Re: Flume integration - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/11/20 11:30:44 UTC, 9 replies.
- How do I access the nested field in a dataframe, spark Streaming app... Please help. - posted by shyla deshpande <de...@gmail.com> on 2016/11/20 17:59:52 UTC, 3 replies.
- Fwd: Yarn resource utilization with Spark pipe() - posted by Sameer Choudhary <sa...@gmail.com> on 2016/11/20 22:02:14 UTC, 5 replies.
- Linear regression + Janino Exception - posted by janardhan shetty <ja...@gmail.com> on 2016/11/21 02:09:03 UTC, 2 replies.
- dataframe data visualization - posted by "wenli.oywl@alibaba-inc.com" <we...@alibaba-inc.com> on 2016/11/21 02:23:58 UTC, 2 replies.
- Re:Re: Multiple streaming aggregations in structured streaming - posted by Xinyu Zhang <ws...@163.com> on 2016/11/21 07:51:28 UTC, 3 replies.
- Will different receivers run on different worker? - posted by Cyanny LIANG <lg...@gmail.com> on 2016/11/21 08:12:24 UTC, 0 replies.
- Pasting into spark-shell doesn't work for Databricks example - posted by jggg777 <jo...@gmail.com> on 2016/11/21 17:23:03 UTC, 6 replies.
- How to write a custom file system? - posted by Samy Dindane <sa...@dindane.com> on 2016/11/21 17:26:10 UTC, 3 replies.
- Starting a new Spark codebase, Python or Scala / Java? - posted by Brandon White <bw...@gmail.com> on 2016/11/21 18:51:54 UTC, 2 replies.
- Cluster deploy mode driver location - posted by Sa...@wellsfargo.com on 2016/11/21 19:04:06 UTC, 2 replies.
- Potential memory leak in yarn ApplicationMaster - posted by Spark User <sp...@gmail.com> on 2016/11/21 21:03:13 UTC, 0 replies.
- RDD Partitions on HDFS file in Hive on Spark Query - posted by yeshwanth kumar <ye...@gmail.com> on 2016/11/21 22:17:56 UTC, 7 replies.
- Re: RDD Partitions not distributed evenly to executors - posted by Thunder Stumpges <th...@gmail.com> on 2016/11/22 02:19:51 UTC, 0 replies.
- newbie question about RDD - posted by Raghav <ra...@gmail.com> on 2016/11/22 05:45:03 UTC, 2 replies.
- code generation memory issue - posted by geoHeil <ge...@gmail.com> on 2016/11/22 07:18:10 UTC, 0 replies.
- two spark-shells spark on mesos not working - posted by John Yost <ho...@gmail.com> on 2016/11/22 12:52:44 UTC, 1 replies.
- [Spark Streaming] map and window operation on DStream only process one batch - posted by Hao Ren <in...@gmail.com> on 2016/11/22 13:48:25 UTC, 0 replies.
- Is there a processing speed difference between DataFrames and Datasets? - posted by jggg777 <jo...@gmail.com> on 2016/11/22 14:50:11 UTC, 1 replies.
- find outliers within data - posted by anup ahire <ah...@gmail.com> on 2016/11/22 16:00:19 UTC, 1 replies.
- how does create dataframe from scala collection handle executor failure? - posted by "Mendelson, Assaf" <As...@rsa.com> on 2016/11/22 16:34:22 UTC, 0 replies.
- parallelizing model training .. - posted by debasishg <gh...@gmail.com> on 2016/11/22 18:27:04 UTC, 0 replies.
- Pregel Question - posted by Saliya Ekanayake <es...@gmail.com> on 2016/11/22 20:08:15 UTC, 1 replies.
- Fault-tolerant Accumulators in stateful operators. - posted by Amit Sela <am...@gmail.com> on 2016/11/22 20:49:15 UTC, 0 replies.
- How do I persist the data after I process the data with Structured streaming... - posted by shyla deshpande <de...@gmail.com> on 2016/11/22 20:55:08 UTC, 3 replies.
- getting error on spark streaming : java.lang.OutOfMemoryError: unable to create new native thread - posted by Mohit Durgapal <du...@gmail.com> on 2016/11/22 21:42:53 UTC, 1 replies.
- Any equivalent method lateral and explore - posted by Mahender Sarangam <Ma...@outlook.com> on 2016/11/22 22:42:56 UTC, 2 replies.
- [Spark MLlib]: Does Spark MLlib supports nonlinear optimization with nonlinear constraints. - posted by hi...@accenture.com on 2016/11/23 04:07:08 UTC, 0 replies.
- Spark application shows success but lots of tasks are skipped in UI - posted by Lantao Jin <ji...@gmail.com> on 2016/11/23 13:45:48 UTC, 2 replies.
- subtractByKey modifes values in the source RDD - posted by Dmitry Dzhus <di...@dzhus.org> on 2016/11/23 16:15:49 UTC, 0 replies.
- how to see Pipeline model information - posted by Zhiliang Zhu <zc...@yahoo.com.INVALID> on 2016/11/23 17:21:57 UTC, 4 replies.
- spark sql jobs heap memory - posted by Koert Kuipers <ko...@tresata.com> on 2016/11/23 18:53:11 UTC, 1 replies.
- spark.yarn.executor.memoryOverhead - posted by Koert Kuipers <ko...@tresata.com> on 2016/11/23 19:01:29 UTC, 1 replies.
- Mapping KMean trained-data to respective records - posted by Reth RM <re...@gmail.com> on 2016/11/23 20:22:44 UTC, 0 replies.
- Spark Shell doesnt seem to use spark workers but Spark Submit does. - posted by kant kodali <ka...@gmail.com> on 2016/11/23 20:45:40 UTC, 1 replies.
- Is there any api for categorical column statistic ? - posted by canan chen <cc...@gmail.com> on 2016/11/24 01:22:32 UTC, 0 replies.
- Invalid log directory running pyspark job - posted by Stephen Boesch <ja...@gmail.com> on 2016/11/24 03:36:31 UTC, 1 replies.
- Apache Spark SQL is taking forever to count billion rows from Cassandra? - posted by kant kodali <ka...@gmail.com> on 2016/11/24 08:03:30 UTC, 7 replies.
- PySpark TaskContext - posted by ofer <of...@gmail.com> on 2016/11/24 09:39:51 UTC, 6 replies.
- io.netty.handler.codec.EncoderException: java.lang.NoSuchMethodError: - posted by Karthik Shyamsunder <ka...@gmail.com> on 2016/11/24 13:35:57 UTC, 0 replies.
- .netty.handler.codec.EncoderException: java.lang.NoSuchMethodError - posted by kshyamsunder <ka...@gmail.com> on 2016/11/24 13:46:38 UTC, 0 replies.
- Re: Fwd: Spark SQL: ArrayIndexOutofBoundsException - posted by cossy <co...@163.com> on 2016/11/24 13:51:37 UTC, 0 replies.
- multiple Spark Thrift Servers running in the same machine throws org.apache.hadoop.security.AccessControlException - posted by 谭成灶 <ta...@live.cn> on 2016/11/24 14:27:57 UTC, 0 replies.
- Hive on Spark is not populating correct records - posted by Vikash Pareek <vi...@infoobjects.com> on 2016/11/24 15:09:25 UTC, 0 replies.
- OS killing Executor due to high (possibly off heap) memory usage - posted by Aniket Bhatnagar <an...@gmail.com> on 2016/11/24 16:16:57 UTC, 4 replies.
- get specific tree or forest structure from pipeline model - posted by Zhiliang Zhu <zc...@yahoo.com.INVALID> on 2016/11/24 17:27:21 UTC, 1 replies.
- Kryo Exception: NegativeArraySizeException - posted by Pedro Tuero <tu...@gmail.com> on 2016/11/24 21:24:13 UTC, 0 replies.
- New Contributor - posted by Manolis Gemeliaris <ge...@gmail.com> on 2016/11/25 00:38:03 UTC, 1 replies.
- Re: Does SparkR or SparkMLib support nonlinear optimization with non linear constraints - posted by Robineast <Ro...@xense.co.uk> on 2016/11/25 11:05:51 UTC, 0 replies.
- [StackOverflow] Size exceeds Integer.MAX_VALUE When Joining 2 Large DFs - posted by Gerard Maas <ge...@gmail.com> on 2016/11/25 12:05:54 UTC, 0 replies.
- RDD persist() not honoured - posted by Io...@nomura.com on 2016/11/25 14:23:17 UTC, 0 replies.
- Multilabel classification with Spark MLlib - posted by "Md. Rezaul Karim" <re...@insight-centre.org> on 2016/11/25 17:27:34 UTC, 2 replies.
- Tracking opened files by Spark application - posted by David Lauzon <da...@gmail.com> on 2016/11/25 18:28:26 UTC, 0 replies.
- Update Cassandra null value - posted by Tomas Carini <to...@gmail.com> on 2016/11/25 18:55:44 UTC, 0 replies.
- Re: Third party library - posted by Reynold Xin <rx...@databricks.com> on 2016/11/25 23:32:00 UTC, 7 replies.
- Why is shuffle write size so large when joining Dataset with nested structure? - posted by taozhuo <ta...@gmail.com> on 2016/11/26 02:16:21 UTC, 2 replies.
- Apache Spark or Spark-Cassandra-Connector doesnt look like it is reading multiple partitions in parallel. - posted by kant kodali <ka...@gmail.com> on 2016/11/26 08:34:34 UTC, 2 replies.
- UDF for gradient ascent - posted by Meeraj Kunnumpurath <me...@servicesymphony.com> on 2016/11/26 15:31:50 UTC, 1 replies.
- Dataframe broadcast join hint not working - posted by Swapnil Shinde <sw...@gmail.com> on 2016/11/26 18:51:07 UTC, 6 replies.
- Re: confirm subscribe to user@spark.apache.org - posted by Arthur Țițeică <na...@cloud.titeica.ro> on 2016/11/27 07:13:29 UTC, 0 replies.
- how to print auc & prc for GBTClassifier, which is okay for RandomForestClassifier - posted by Zhiliang Zhu <zc...@yahoo.com.INVALID> on 2016/11/27 17:52:35 UTC, 1 replies.
- createDataFrame causing a strange error. - posted by Andrew Holway <an...@otternetworks.de> on 2016/11/27 19:32:20 UTC, 5 replies.
- Spark ignoring partition names without equals (=) separator - posted by Prasanna Santhanam <ts...@apache.org> on 2016/11/28 04:18:02 UTC, 4 replies.
- if conditions - posted by Hitesh Goyal <hi...@nlpcaptcha.com> on 2016/11/28 04:45:55 UTC, 3 replies.
- Spark app write too many small parquet files - posted by Kevin Tran <ke...@gmail.com> on 2016/11/28 05:44:34 UTC, 3 replies.
- [Spark R]: Does Spark R supports nonlinear optimization with nonlinear constraints. - posted by hi...@accenture.com on 2016/11/28 07:12:05 UTC, 0 replies.
- Do I have to wrap akka around spark streaming app? - posted by shyla deshpande <de...@gmail.com> on 2016/11/28 08:11:21 UTC, 10 replies.
- time to run Spark SQL query - posted by Hitesh Goyal <hi...@nlpcaptcha.com> on 2016/11/28 12:41:06 UTC, 1 replies.
- How to use logback - posted by Erwan ALLAIN <ea...@gmail.com> on 2016/11/28 14:00:57 UTC, 0 replies.
- Re: Spark Metrics: custom source/sink configurations not getting recognized - posted by Matthew Dailey <ma...@gmail.com> on 2016/11/28 19:48:36 UTC, 0 replies.
- What do I set rolling log to avoid filling up the disk? - posted by kant kodali <ka...@gmail.com> on 2016/11/28 22:21:39 UTC, 0 replies.
- How to disable write ahead logs? - posted by Tim Harsch <th...@cray.com> on 2016/11/29 00:04:07 UTC, 1 replies.
- null values returned by max() over a window function - posted by Han-Cheol Cho <ha...@nhn-techorus.com> on 2016/11/29 03:57:03 UTC, 1 replies.
- groupbykey data access size vs Reducer number - posted by memoryzpp <me...@gmail.com> on 2016/11/29 04:42:17 UTC, 0 replies.
- Re: Bit-wise AND operation between integers - posted by Reynold Xin <rx...@databricks.com> on 2016/11/29 05:42:32 UTC, 0 replies.
- Spark Streaming + Kinesis : Receiver MaxRate is violated - posted by dav009 <da...@gmail.com> on 2016/11/29 06:39:53 UTC, 1 replies.
- Java Collections.emptyList inserted as null object in cassandra - posted by Selvam Raman <se...@gmail.com> on 2016/11/29 13:47:54 UTC, 0 replies.
- python environments with "local" and "yarn-client" - Boto failing on HDP2.5 - posted by Andrew Holway <an...@otternetworks.de> on 2016/11/29 15:08:02 UTC, 0 replies.
- build models in parallel - posted by Xiaomeng Wan <sh...@gmail.com> on 2016/11/29 16:53:16 UTC, 1 replies.
- Porting LIBSVM models to Spark - posted by Pat Blachly <pb...@doximity.com> on 2016/11/29 18:18:03 UTC, 2 replies.
- Spark 2 Alternative to SparkContext clearJars()? - posted by lukasbradley <lu...@gmail.com> on 2016/11/29 18:52:02 UTC, 0 replies.
- Does MapWithState follow with a shuffle ? - posted by Amit Sela <am...@gmail.com> on 2016/11/29 21:16:52 UTC, 1 replies.
- Best approach to schedule Spark jobs - posted by Bruno Faria <br...@hotmail.com> on 2016/11/29 22:00:46 UTC, 2 replies.
- Spark Job not exited and shows running - posted by Selvam Raman <se...@gmail.com> on 2016/11/29 22:20:32 UTC, 0 replies.
- Fault-tolerant Accumulators in a DStream-only transformations. - posted by Amit Sela <am...@gmail.com> on 2016/11/29 22:42:25 UTC, 0 replies.
- Controlling data placement / locality - posted by Michael Johnson <mj...@yahoo.com.INVALID> on 2016/11/29 22:47:21 UTC, 0 replies.
- SVM regression in Spark - posted by roni <ro...@gmail.com> on 2016/11/30 02:50:25 UTC, 1 replies.
- [structured streaming] How to remove outdated data when use Window Operations - posted by Xinyu Zhang <ws...@163.com> on 2016/11/30 04:30:23 UTC, 0 replies.
- Re: Can't read tables written in Spark 2.1 in Spark 2.0 (and earlier) - posted by Timur Shenkao <ts...@timshenkao.su> on 2016/11/30 07:34:05 UTC, 4 replies.
- Logistic regression using gradient ascent - posted by Meeraj Kunnumpurath <me...@servicesymphony.com> on 2016/11/30 08:15:55 UTC, 0 replies.
- Can I have two different receivers for my Spark client program? - posted by kant kodali <ka...@gmail.com> on 2016/11/30 08:47:29 UTC, 0 replies.
- PySpark to remote cluster - posted by Klaus Schaefers <kl...@philips.com> on 2016/11/30 10:44:31 UTC, 1 replies.
- Parallel dynamic partitioning producing duplicated data - posted by Mehdi Ben Haj Abbes <me...@gmail.com> on 2016/11/30 15:12:44 UTC, 0 replies.
- SPARK 2.0 CSV exports (https://issues.apache.org/jira/browse/SPARK-16893) - posted by Gourav Sengupta <go...@gmail.com> on 2016/11/30 18:19:19 UTC, 0 replies.
- Save the date: ApacheCon Miami, May 15-19, 2017 - posted by Rich Bowen <rb...@apache.org> on 2016/11/30 19:24:52 UTC, 0 replies.
- updateStateByKey -- when the key is multi-column (like a composite key ) - posted by shyla deshpande <de...@gmail.com> on 2016/11/30 20:30:53 UTC, 1 replies.
- java.lang.Exception: Could not compute split, block input-0-1480539568000 not found - posted by kant kodali <ka...@gmail.com> on 2016/11/30 21:08:01 UTC, 1 replies.