You are viewing a plain text version of this content. The canonical link for it is here.
- Re: spark kafka consumer with kerberos - posted by Saisai Shao <sa...@gmail.com> on 2017/04/01 00:58:30 UTC, 0 replies.
- [Spark Core]: flatMap/reduceByKey seems to be quite slow with Long keys on some distributions - posted by Richard Tsai <ri...@gmail.com> on 2017/04/01 07:29:21 UTC, 0 replies.
- Convert Dataframe to Dataset in pyspark - posted by Selvam Raman <se...@gmail.com> on 2017/04/01 12:36:25 UTC, 1 replies.
- Cuesheet - spark deployment - posted by Deepu Raj <de...@outlook.com> on 2017/04/01 12:55:01 UTC, 0 replies.
- getting error while storing data in Hbase - posted by Chintan Bhatt <ch...@charusat.ac.in> on 2017/04/01 16:47:10 UTC, 0 replies.
- pyspark bug with PYTHONHASHSEED - posted by Paul Tremblay <pa...@gmail.com> on 2017/04/01 19:43:29 UTC, 0 replies.
- bug with PYTHONHASHSEED - posted by Paul Tremblay <pa...@gmail.com> on 2017/04/01 19:54:17 UTC, 5 replies.
- strange behavior of spark 2.1.0 - posted by Jiang Jacky <ji...@gmail.com> on 2017/04/01 20:14:30 UTC, 2 replies.
- read binary file in PySpark - posted by Yogesh Vyas <in...@gmail.com> on 2017/04/02 06:46:11 UTC, 0 replies.
- Partitioning strategy - posted by ja...@accenture.com on 2017/04/02 10:32:13 UTC, 1 replies.
- Does Apache Spark use any Dependency Injection framework? - posted by kant kodali <ka...@gmail.com> on 2017/04/02 13:28:17 UTC, 1 replies.
- Update DF record with delta data in spark - posted by Selvam Raman <se...@gmail.com> on 2017/04/02 13:57:35 UTC, 1 replies.
- Represent documents as a sequence of wordID & frequency and perform PCA - posted by Old-School <gi...@outlook.com> on 2017/04/02 14:51:47 UTC, 0 replies.
- Re: Spark SQL 2.1 Complex SQL - Query Planning Issue - posted by Sathish Kumaran Vairavelu <vs...@gmail.com> on 2017/04/02 15:54:58 UTC, 0 replies.
- Graph Analytics on HBase with HGraphDB and Spark GraphFrames - posted by Robert Yokota <ra...@gmail.com> on 2017/04/02 16:40:07 UTC, 3 replies.
- Re: Looking at EMR Logs - posted by Paul Tremblay <pa...@gmail.com> on 2017/04/02 20:05:33 UTC, 0 replies.
- org.apache.spark.sql.AnalysisException: resolved attribute(s) code#906 missing from code#1992, - posted by grjohnson35 <gj...@artemishealth.com> on 2017/04/03 02:30:52 UTC, 0 replies.
- What is the difference between forEachAsync vs forEachPartitionAsync? - posted by kant kodali <ka...@gmail.com> on 2017/04/03 03:36:01 UTC, 1 replies.
- Benchmarking streaming frameworks - posted by gvdongen <gi...@ugent.be> on 2017/04/03 07:34:19 UTC, 2 replies.
- Do we support excluding the current row in PARTITION BY windowing functions? - posted by mathewwicks <ma...@gmail.com> on 2017/04/03 08:52:26 UTC, 2 replies.
- Read file and represent rows as Vectors - posted by Old-School <gi...@outlook.com> on 2017/04/03 12:05:18 UTC, 2 replies.
- Executor unable to pick postgres driver in Spark standalone cluster - posted by Rishikesh Teke <ri...@gmail.com> on 2017/04/03 13:43:51 UTC, 1 replies.
- Pyspark - pickle.PicklingError: Can't pickle - posted by Selvam Raman <se...@gmail.com> on 2017/04/03 18:34:37 UTC, 0 replies.
- _SUCCESS file validation on read - posted by drewrobb <dr...@gmail.com> on 2017/04/03 20:58:21 UTC, 0 replies.
- Re: Alternatives for dataframe collectAsList() - posted by Paul Tremblay <pa...@gmail.com> on 2017/04/04 00:44:49 UTC, 3 replies.
- Do we support excluding the CURRENT ROW in PARTITION BY windowing functions? - posted by mathewwicks <ma...@gmail.com> on 2017/04/04 02:21:34 UTC, 0 replies.
- map transform on array in spark sql - posted by Koert Kuipers <ko...@tresata.com> on 2017/04/04 03:18:17 UTC, 1 replies.
- is there a way to persist the lineages generated by spark? - posted by kant kodali <ka...@gmail.com> on 2017/04/04 03:19:40 UTC, 4 replies.
- how do i force unit test to do whole stage codegen - posted by Koert Kuipers <ko...@tresata.com> on 2017/04/04 20:10:55 UTC, 5 replies.
- Why do we ever run out of memory in Spark Structured Streaming? - posted by kant kodali <ka...@gmail.com> on 2017/04/05 00:17:54 UTC, 4 replies.
- With Twitter4j API, why am I not able to pull tweets with certain keywords? - posted by Gaurav1809 <ga...@gmail.com> on 2017/04/05 04:02:52 UTC, 1 replies.
- spark stages UI page has 'gc time' column Emtpy - posted by satishl <sa...@gmail.com> on 2017/04/05 04:24:06 UTC, 0 replies.
- reading binary file in spark-kafka streaming - posted by Yogesh Vyas <in...@gmail.com> on 2017/04/05 06:11:23 UTC, 0 replies.
- Market Basket Analysis by deploying FP Growth algorithm - posted by asethia <se...@gmail.com> on 2017/04/05 09:29:05 UTC, 1 replies.
- convert JavaRDD> to JavaRDD - posted by Hamza HACHANI <ha...@supcom.tn> on 2017/04/05 09:43:08 UTC, 4 replies.
- JSON lib works differently in spark-shell and IDE like intellij - posted by Mungeol Heo <mu...@gmail.com> on 2017/04/05 09:52:36 UTC, 1 replies.
- Re: How to use ManualClock with Spark streaming - posted by Hemalatha A <he...@googlemail.com> on 2017/04/05 10:59:21 UTC, 1 replies.
- Re: unit testing in spark - posted by Shiva Ramagopal <tr...@gmail.com> on 2017/04/05 11:32:15 UTC, 4 replies.
- Spark Streaming Kafka Job has strange behavior for certain tasks - posted by Justin Miller <ju...@protectwise.com> on 2017/04/05 17:03:41 UTC, 0 replies.
- Spark fair scheduler pools vs. YARN queues - posted by Nick Chammas <ni...@gmail.com> on 2017/04/05 19:27:07 UTC, 5 replies.
- Master-Worker communication on Standalone cluster issues - posted by map reduced <k3...@gmail.com> on 2017/04/05 21:01:41 UTC, 1 replies.
- run-time exception trying to train MultilayerPerceptronClassifier with DataFrame - posted by Pete Prokopowicz <pp...@groupon.com.INVALID> on 2017/04/05 21:21:03 UTC, 0 replies.
- Why chinese character gash appear when i use spark textFile? - posted by JoneZhang <jo...@gmail.com> on 2017/04/06 04:16:27 UTC, 0 replies.
- Why chinese character gash appear when i use spark textFile? - posted by Jone Zhang <jo...@gmail.com> on 2017/04/06 04:47:07 UTC, 1 replies.
- Consuming AWS Cloudwatch logs from Kinesis into Spark - posted by Tim Smith <se...@gmail.com> on 2017/04/06 04:47:15 UTC, 0 replies.
- Spark and Hive connection - posted by infa elance <in...@gmail.com> on 2017/04/06 05:06:16 UTC, 2 replies.
- How spark connects to Hive metastore? - posted by infaelance <in...@gmail.com> on 2017/04/06 05:20:58 UTC, 0 replies.
- use UTF-16 decode in pyspark streaming - posted by Yogesh Vyas <in...@gmail.com> on 2017/04/06 06:44:55 UTC, 0 replies.
- scala test is unable to initialize spark context. - posted by PS...@in.imshealth.com on 2017/04/06 08:03:44 UTC, 2 replies.
- Reading ASN.1 files in Spark - posted by Hamza HACHANI <ha...@supcom.tn> on 2017/04/06 09:09:03 UTC, 2 replies.
- How does partitioning happen for binary files in spark ? - posted by ashwini anand <aa...@gmail.com> on 2017/04/06 10:13:07 UTC, 2 replies.
- distinct query getting stuck at ShuffleBlockFetcherIterator - posted by Ramesh Krishnan <ra...@gmail.com> on 2017/04/06 11:54:34 UTC, 2 replies.
- Error while reading the CSV - posted by nayan sharma <na...@gmail.com> on 2017/04/06 13:06:39 UTC, 5 replies.
- Re: Error while reading the CSV - posted by Jörn Franke <jo...@gmail.com> on 2017/04/06 13:12:34 UTC, 5 replies.
- Is the trigger interval the same as batch interval in structured streaming? - posted by kant kodali <ka...@gmail.com> on 2017/04/06 17:26:02 UTC, 6 replies.
- df.count() returns one more count than SELECT COUNT() - posted by Mohamed Nadjib Mami <mo...@gmail.com> on 2017/04/06 17:29:26 UTC, 2 replies.
- What is the best way to run a scheduled spark batch job on AWS EC2 ? - posted by shyla deshpande <de...@gmail.com> on 2017/04/07 00:04:46 UTC, 16 replies.
- Apache Drill vs Spark SQL - posted by kant kodali <ka...@gmail.com> on 2017/04/07 05:34:20 UTC, 1 replies.
- Re: Returning DataFrame for text file - posted by "颜发才 (Yan Facai)" <fa...@gmail.com> on 2017/04/07 05:58:05 UTC, 1 replies.
- reading snappy eventlog files from hdfs using spark - posted by satishl <sa...@gmail.com> on 2017/04/07 06:34:29 UTC, 2 replies.
- Hi - posted by kant kodali <ka...@gmail.com> on 2017/04/07 09:27:41 UTC, 1 replies.
- Is checkpointing in Spark Streaming Synchronous or Asynchronous ? - posted by kant kodali <ka...@gmail.com> on 2017/04/07 10:19:45 UTC, 2 replies.
- Spark 2.1 ml library scalability - posted by Aseem Bansal <as...@gmail.com> on 2017/04/07 11:12:14 UTC, 3 replies.
- Cant convert Dataset to case class with Option fields - posted by Dirceu Semighini Filho <di...@gmail.com> on 2017/04/07 13:59:26 UTC, 1 replies.
- reducebykey - posted by Stephen Fletcher <st...@gmail.com> on 2017/04/07 14:26:01 UTC, 1 replies.
- Does Spark uses its own HDFS client? - posted by Alvaro Brandon <al...@gmail.com> on 2017/04/07 14:32:23 UTC, 2 replies.
- small job runs out of memory using wholeTextFiles - posted by Paul Tremblay <pa...@gmail.com> on 2017/04/07 14:57:14 UTC, 0 replies.
- Contributed to spark - posted by Stephen Fletcher <st...@gmail.com> on 2017/04/07 17:31:18 UTC, 1 replies.
- Structured streaming and writing output to Cassandra - posted by shyla deshpande <de...@gmail.com> on 2017/04/07 18:23:52 UTC, 1 replies.
- BucketedRandomProjectionLSHModel algorithm details - posted by vvinton <vi...@gmail.com> on 2017/04/07 20:34:55 UTC, 0 replies.
- Assigning a unique row ID - posted by Everett Anderson <ev...@nuna.com.INVALID> on 2017/04/07 22:56:14 UTC, 6 replies.
- Why dataframe can be more efficient than dataset? - posted by Shiyuan <gs...@gmail.com> on 2017/04/08 18:15:18 UTC, 7 replies.
- How to convert Spark MLlib vector to ML Vector? - posted by "Md. Rezaul Karim" <re...@insight-centre.org> on 2017/04/09 14:01:13 UTC, 5 replies.
- Does spark 2.1.0 structured streaming support jdbc sink? - posted by Hemanth Gudela <he...@qvantel.com> on 2017/04/09 20:30:18 UTC, 4 replies.
- Spark 2.1 and Hive Metastore - posted by Benjamin Kim <bb...@gmail.com> on 2017/04/09 20:34:12 UTC, 0 replies.
- spark off heap memory - posted by Georg Heiler <ge...@gmail.com> on 2017/04/10 04:57:56 UTC, 0 replies.
- pandas DF DStream to Spark dataframe - posted by Yogesh Vyas <in...@gmail.com> on 2017/04/10 05:13:28 UTC, 0 replies.
- pandas DF Dstream to Spark DF - posted by Yogesh Vyas <in...@gmail.com> on 2017/04/10 05:19:27 UTC, 1 replies.
- Two Nodes :SparkContext Null Pointer - posted by Sriram <sr...@gmail.com> on 2017/04/10 07:33:32 UTC, 1 replies.
- Dataframes na fill with empty list - posted by Sumona Routh <su...@gmail.com> on 2017/04/11 01:18:33 UTC, 3 replies.
- Any NLP library for sentiment analysis in Spark? - posted by Gaurav1809 <ga...@gmail.com> on 2017/04/11 09:02:58 UTC, 9 replies.
- Spark (SQL / Structured Streaming) Cassandra - PreparedStatement - posted by Bastien DINE <ba...@coservit.com> on 2017/04/11 09:05:07 UTC, 0 replies.
- optimising storage and ec2 instances - posted by Zeming Yu <ze...@gmail.com> on 2017/04/11 10:07:08 UTC, 3 replies.
- Spark Streaming. Real-time save data and visualize on dashboard - posted by tencas <di...@gmail.com> on 2017/04/11 14:35:03 UTC, 4 replies.
- Feasability limits of joins in SparkSQL (Why does my driver explode with a large number of joins?) - posted by Rick Moritz <ra...@gmail.com> on 2017/04/11 17:15:50 UTC, 0 replies.
- Exception on Join with Spark2.1 - posted by Andrés Ivaldi <ia...@gmail.com> on 2017/04/11 19:22:52 UTC, 0 replies.
- [Spark-SQL] : Incremental load in Pyspark - posted by Vamsi Makkena <kv...@gmail.com> on 2017/04/11 19:23:31 UTC, 3 replies.
- Optimisation Tips - posted by Steve Robinson <St...@aquilainsight.com> on 2017/04/12 14:45:08 UTC, 2 replies.
- Hive ::: how to select where conditions dynamically using CASE - posted by nancy henry <na...@gmail.com> on 2017/04/12 15:41:38 UTC, 0 replies.
- Deploying Spark Applications. Best Practices And Patterns - posted by Sam Elamin <hu...@gmail.com> on 2017/04/12 20:11:16 UTC, 1 replies.
- Re: Design patterns involving Spark - posted by Harish Butani <rh...@gmail.com> on 2017/04/13 02:40:49 UTC, 0 replies.
- Avro/Parquet GenericFixed decimal is not read into Spark correctly - posted by Justin Pihony <ju...@gmail.com> on 2017/04/13 03:12:15 UTC, 0 replies.
- unsubscribe - posted by tian zhang <tz...@yahoo.com.INVALID> on 2017/04/13 07:17:07 UTC, 0 replies.
- checkpoint - posted by issues solution <is...@gmail.com> on 2017/04/13 09:03:06 UTC, 3 replies.
- Hive Context and SQL Context interoperability - posted by Deepak Sharma <de...@gmail.com> on 2017/04/13 09:41:08 UTC, 0 replies.
- why we can t apply udf on rdd ??? - posted by issues solution <is...@gmail.com> on 2017/04/13 09:52:40 UTC, 1 replies.
- checkpoint how to use correctly checkpoint with udf - posted by issues solution <is...@gmail.com> on 2017/04/13 12:02:25 UTC, 0 replies.
- commons.lang3.time incompatible - posted by Mars Xu <xu...@gmail.com> on 2017/04/13 12:52:24 UTC, 0 replies.
- How to coorect code after java.lang.stackoverflow - posted by issues solution <is...@gmail.com> on 2017/04/13 13:01:38 UTC, 0 replies.
- Number of column in data frame - posted by issues solution <is...@gmail.com> on 2017/04/13 13:12:51 UTC, 0 replies.
- how to master cache and chekpoint for pyspark - posted by issues solution <is...@gmail.com> on 2017/04/13 13:25:17 UTC, 0 replies.
- Fwd: ERROR Dropping SparkListenerEvent - posted by Patrick Gomes <go...@gmail.com> on 2017/04/13 14:10:53 UTC, 0 replies.
- Yarn containers getting killed, error 52, multiple joins - posted by rachmaninovquartet <ra...@gmail.com> on 2017/04/13 15:08:40 UTC, 2 replies.
- SPARK-20325 - Spark Structured Streaming documentation Update: checkpoint configuration - posted by Katherin Eri <ka...@gmail.com> on 2017/04/14 08:15:02 UTC, 2 replies.
- Spark API authentication - posted by Sergey <gr...@yandex.ru> on 2017/04/14 09:18:53 UTC, 5 replies.
- Parameter in FlatMap function - posted by "Soheila S." <so...@gmail.com> on 2017/04/14 11:32:42 UTC, 1 replies.
- create column with map function apply to dataframe - posted by issues solution <is...@gmail.com> on 2017/04/14 13:07:12 UTC, 1 replies.
- PySpark row_number Question - posted by infa elance <in...@gmail.com> on 2017/04/14 15:19:33 UTC, 1 replies.
- Memory problems with simple ETL in Pyspark - posted by Patrick McCarthy <pm...@dstillery.com> on 2017/04/14 16:10:15 UTC, 4 replies.
- Spark Testing Library Discussion - posted by Holden Karau <ho...@pigscanfly.ca> on 2017/04/14 18:17:18 UTC, 11 replies.
- Driver spins hours in query plan optimization - posted by Everett Anderson <ev...@nuna.com.INVALID> on 2017/04/14 20:39:05 UTC, 0 replies.
- NPE in UDF yet no nulls in data because analyzer runs test with nulls - posted by Koert Kuipers <ko...@tresata.com> on 2017/04/14 22:57:50 UTC, 0 replies.
- Join streams Apache Spark - posted by tencas <di...@gmail.com> on 2017/04/15 20:12:09 UTC, 0 replies.
- Problem with Execution plan using loop - posted by Javier Rey <jr...@gmail.com> on 2017/04/16 03:31:37 UTC, 2 replies.
- Spark SQL (Pyspark) - Parallel processing of multiple datasets - posted by Amol Patil <am...@gmail.com> on 2017/04/16 22:52:05 UTC, 5 replies.
- Shall I use Apache Zeppelin for data analytics & visualization? - posted by Gaurav1809 <ga...@gmail.com> on 2017/04/17 04:55:20 UTC, 6 replies.
- How to store 10M records in HDFS to speed up further filtering? - posted by MoTao <mo...@sensetime.com> on 2017/04/17 06:23:22 UTC, 3 replies.
- 答复: How to store 10M records in HDFS to speed up further filtering? - posted by 莫涛 <mo...@sensetime.com> on 2017/04/17 07:01:49 UTC, 3 replies.
- 答复: 答复: How to store 10M records in HDFS to speed up further filtering? - posted by 莫涛 <mo...@sensetime.com> on 2017/04/17 08:23:56 UTC, 3 replies.
- Spark-shell's performance - posted by Richard Hanson <rh...@mailbox.org> on 2017/04/17 10:18:08 UTC, 1 replies.
- Invalidating/Remove complete mapWithState state - posted by Matthias Niehoff <ma...@codecentric.de> on 2017/04/17 12:13:03 UTC, 2 replies.
- how to add new column using regular expression within pyspark dataframe - posted by Zeming Yu <ze...@gmail.com> on 2017/04/17 12:25:06 UTC, 8 replies.
- isin query - posted by nayan sharma <na...@gmail.com> on 2017/04/17 14:35:19 UTC, 3 replies.
- filter operation using isin - posted by nayan sharma <na...@gmail.com> on 2017/04/17 14:42:25 UTC, 0 replies.
- Handling skewed data - posted by Vishnu Viswanath <vi...@gmail.com> on 2017/04/17 15:17:45 UTC, 1 replies.
- Is there a way to tell if a receiver is a Reliable Receiver? - posted by Justin Pihony <ju...@gmail.com> on 2017/04/17 19:34:21 UTC, 1 replies.
- Application not found in RM - posted by Mohammad Tariq <do...@gmail.com> on 2017/04/18 00:41:34 UTC, 0 replies.
- Running 100 GB at standalone node - posted by Vivek Mishra <vm...@impetus.com> on 2017/04/18 09:10:28 UTC, 0 replies.
- In an executor, are the Python worker memory and the MemoryOverhead overlapping? - posted by o_rayer <au...@artefact.is> on 2017/04/18 15:01:24 UTC, 0 replies.
- CfP - VHPC at ISC extension - Papers due May 2 - posted by VHPC 17 <vh...@gmail.com> on 2017/04/18 17:36:22 UTC, 0 replies.
- An Apache Spark metric sink for Kafka - posted by Erik Erlandson <ej...@redhat.com> on 2017/04/18 19:40:33 UTC, 0 replies.
- Spark 2.1.0 hanging while writing a table in HDFS in parquet format - posted by gae123 <pk...@gae123.com> on 2017/04/18 21:26:16 UTC, 0 replies.
- How to fix error "Failed to get records for..." after polling for 120000 - posted by Dmitry Goldenberg <dg...@hexastax.com> on 2017/04/18 22:22:09 UTC, 0 replies.
- Questions on HDFS with Spark - posted by kant kodali <ka...@gmail.com> on 2017/04/18 23:07:58 UTC, 0 replies.
- java.lang.java.lang.UnsupportedOperationException - posted by issues solution <is...@gmail.com> on 2017/04/19 11:42:17 UTC, 2 replies.
- Real time incremental Update to Spark Graphs. - posted by Siddharth Ubale <si...@syncoms.com> on 2017/04/19 12:09:10 UTC, 0 replies.
- Problem with Java and Scala interoperability // streaming - posted by kant kodali <ka...@gmail.com> on 2017/04/19 20:42:46 UTC, 5 replies.
- JDBC write error of Pyspark dataframe - posted by Cinyoung Hur <ci...@gmail.com> on 2017/04/20 01:42:53 UTC, 0 replies.
- checkpoint on spark standalone - posted by Vivek Mishra <vm...@impetus.com> on 2017/04/20 07:28:31 UTC, 0 replies.
- Concurrent DataFrame.saveAsTable into non-existant tables fails the second job despite Mode.APPEND - posted by Rick Moritz <ra...@gmail.com> on 2017/04/20 07:48:19 UTC, 1 replies.
- 答复: 答复: 答复: How to store 10M records in HDFS to speed up further filtering? - posted by 莫涛 <mo...@sensetime.com> on 2017/04/20 09:09:23 UTC, 2 replies.
- [sparkR] [MLlib] : Is word2vec implemented in SparkR MLlib ? - posted by Radhwane Chebaane <r....@mindlytix.com> on 2017/04/20 11:30:14 UTC, 1 replies.
- Maximum Partitioner size - posted by Patrick GRANDJEAN <pa...@yahoo.fr.INVALID> on 2017/04/20 14:57:23 UTC, 0 replies.
- Re: Any Idea about this error : IllegalArgumentException: File segment length cannot be negative ? - posted by Victor Tso-Guillen <vt...@paxata.com> on 2017/04/20 15:51:38 UTC, 0 replies.
- Spark structured streaming: Is it possible to periodically refresh static data frame? - posted by Hemanth Gudela <he...@qvantel.com> on 2017/04/20 20:08:57 UTC, 9 replies.
- Long Shuffle Read Blocked Time - posted by Pradeep Gollakota <pr...@gmail.com> on 2017/04/20 22:12:20 UTC, 1 replies.
- Please participate in a research survey on graphs - posted by Siddhartha Sahu <s3...@uwaterloo.ca> on 2017/04/21 00:58:52 UTC, 0 replies.
- pyspark.sql.DataFrame write error to Postgres DB - posted by Cinyoung Hur <ci...@gmail.com> on 2017/04/21 02:54:28 UTC, 2 replies.
- Azure Event Hub with Pyspark - posted by ayan guha <gu...@gmail.com> on 2017/04/21 03:49:31 UTC, 5 replies.
- splitting a huge file - posted by Paul Tremblay <pa...@gmail.com> on 2017/04/21 18:36:31 UTC, 3 replies.
- What is correct behavior for spark.task.maxFailures? - posted by "Chawla,Sumit " <su...@gmail.com> on 2017/04/21 20:32:26 UTC, 4 replies.
- question regarding pyspark - posted by "Afshin, Bardia" <Ba...@capitalone.com> on 2017/04/21 23:37:58 UTC, 1 replies.
- how to get variable type signature through the api - posted by Matthew Purdy <bu...@gmail.com> on 2017/04/22 05:20:49 UTC, 0 replies.
- Spark SQL - Global Temporary View is not behaving as expected - posted by Hemanth Gudela <he...@qvantel.com> on 2017/04/22 07:56:51 UTC, 6 replies.
- Off heap memory settings and Tungsten - posted by geoHeil <ge...@gmail.com> on 2017/04/22 11:44:18 UTC, 1 replies.
- heap overflow within seconds : pyspark kinesis stream with Spark 2.1.0 - posted by s t <se...@hotmail.com> on 2017/04/22 20:20:33 UTC, 0 replies.
- Cannot convert from JavaRDD to Dataframe - posted by "Chen, Mingrui" <mi...@mail.smu.edu> on 2017/04/23 16:13:47 UTC, 1 replies.
- How to convert Dstream of JsonObject to Dataframe in spark 2.1.0? - posted by kant kodali <ka...@gmail.com> on 2017/04/23 17:50:19 UTC, 2 replies.
- Questions related to writing data to S3 - posted by Richard Hanson <rh...@mailbox.org> on 2017/04/23 18:49:41 UTC, 1 replies.
- accessing type signature - posted by Bulldog20630405 <bu...@gmail.com> on 2017/04/24 00:35:15 UTC, 0 replies.
- Authorizations in thriftserver - posted by vincent gromakowski <vi...@gmail.com> on 2017/04/24 07:32:20 UTC, 1 replies.
- Spark Mlib - java.lang.OutOfMemoryError: Java heap space - posted by Selvam Raman <se...@gmail.com> on 2017/04/24 10:22:28 UTC, 1 replies.
- Spark diclines mesos offers - posted by Pavel Plotnikov <pa...@team.wrike.com> on 2017/04/24 11:53:03 UTC, 2 replies.
- Spark registered view in "Future" - View changes updated in "Future" are lost in main thread - posted by Hemanth Gudela <he...@qvantel.com> on 2017/04/24 12:29:34 UTC, 0 replies.
- How to maintain order of key-value in DataFrame same as JSON? - posted by Devender Yadav <de...@impetus.co.in> on 2017/04/24 12:45:59 UTC, 3 replies.
- how to create List in pyspark - posted by Selvam Raman <se...@gmail.com> on 2017/04/24 16:27:51 UTC, 2 replies.
- removing columns from file - posted by "Afshin, Bardia" <Ba...@capitalone.com> on 2017/04/24 16:48:43 UTC, 1 replies.
- community feedback on RedShift with Spark - posted by "Afshin, Bardia" <Ba...@capitalone.com> on 2017/04/24 17:07:07 UTC, 2 replies.
- Arraylist is empty after JavaRDD.foreach - posted by Devender Yadav <de...@impetus.co.in> on 2017/04/24 17:36:18 UTC, 0 replies.
- How to convert DataFrame to JSON String in Java 7 - posted by Devender Yadav <de...@impetus.co.in> on 2017/04/24 17:44:01 UTC, 0 replies.
- Re: Arraylist is empty after JavaRDD.foreach - posted by Jörn Franke <jo...@gmail.com> on 2017/04/24 17:45:08 UTC, 2 replies.
- Spark-SQL Query Optimization: overlapping ranges - posted by "Lavelle, Shawn" <Sh...@osii.com> on 2017/04/24 22:46:11 UTC, 3 replies.
- udf that handles null values - posted by Zeming Yu <ze...@gmail.com> on 2017/04/25 00:22:49 UTC, 3 replies.
- pyspark vector - posted by Zeming Yu <ze...@gmail.com> on 2017/04/25 00:36:12 UTC, 2 replies.
- one hot encode a column of vector - posted by Zeming Yu <ze...@gmail.com> on 2017/04/25 01:31:45 UTC, 1 replies.
- how to find the nearest holiday - posted by Zeming Yu <ze...@gmail.com> on 2017/04/25 07:39:36 UTC, 4 replies.
- spark streaming resiliency - posted by vincent gromakowski <vi...@gmail.com> on 2017/04/25 13:14:30 UTC, 0 replies.
- Spark Streaming 2.1 Kafka consumer - retrieving offset commits for each poll - posted by Dominik Safaric <do...@gmail.com> on 2017/04/25 19:43:48 UTC, 7 replies.
- weird error message - posted by "Afshin, Bardia" <Ba...@capitalone.com> on 2017/04/25 23:57:24 UTC, 4 replies.
- [ann] Release of TensorFrames 0.2.8 - posted by Tim Hunter <ti...@databricks.com> on 2017/04/26 00:24:08 UTC, 0 replies.
- WrappedArray to row of relational Db - posted by vaibhavrtk <va...@gmail.com> on 2017/04/26 06:29:13 UTC, 0 replies.
- Create dataframe from RDBMS table using JDBC - posted by Devender Yadav <de...@impetus.co.in> on 2017/04/26 07:26:29 UTC, 1 replies.
- Last chance: ApacheCon is just three weeks away - posted by Rich Bowen <rb...@rcbowen.com> on 2017/04/26 13:46:12 UTC, 0 replies.
- Calculate mode separately for multiple columns in row - posted by Everett Anderson <ev...@nuna.com.INVALID> on 2017/04/26 17:21:30 UTC, 1 replies.
- stand-alone deployMode=cluster problems - posted by Lanny Ripple <la...@spotright.com> on 2017/04/26 18:54:36 UTC, 0 replies.
- help/suggestions to setup spark cluster - posted by anna stax <an...@gmail.com> on 2017/04/26 21:02:49 UTC, 6 replies.
- How to create SparkSession using SparkConf? - posted by kant kodali <ka...@gmail.com> on 2017/04/26 22:22:53 UTC, 6 replies.
- 10th Spark Summit 2017 at Moscone Center - posted by Jules Damji <dm...@comcast.net> on 2017/04/27 00:49:30 UTC, 0 replies.
- [Spark Core] Why SetAccumulator is buried in org.apache.spark.sql.execution.debug? - posted by "v.chesnokov" <v....@inforion.ru> on 2017/04/27 10:14:17 UTC, 0 replies.
- Synonym handling replacement issue with UDF in Apache Spark - posted by Nishanth <ni...@yahoo.com.INVALID> on 2017/04/27 13:58:55 UTC, 2 replies.
- javaRDD to collectasMap throuwa ava.lang.NegativeArraySizeException - posted by Manohar753 <ma...@happiestminds.com> on 2017/04/27 14:31:59 UTC, 0 replies.
- Data Skew in Dataframe Groupby - Any suggestions? - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2017/04/27 15:27:27 UTC, 0 replies.
- [Pyspark, Python 2.7] Executor hangup caused by Unicode error while logging uncaught exception in worker - posted by Sebastian Nagel <wa...@googlemail.com> on 2017/04/27 16:03:14 UTC, 0 replies.
- Initialize Gaussian Mixture Model using Spark ML dataframe API - posted by Tim Smith <se...@gmail.com> on 2017/04/27 17:46:53 UTC, 0 replies.
- Why "Initial job has not accepted any resources"? - posted by Yuan Fang <yf...@advisorsoftware.com> on 2017/04/27 21:51:12 UTC, 1 replies.
- Has anyone used CoreNLP from stanford for sentiment analysis in Spark? It does not work as desired for me. - posted by Gaurav1809 <ga...@gmail.com> on 2017/04/28 08:05:53 UTC, 2 replies.
- "java.lang.IllegalStateException: There is no space for new record" in GraphFrames - posted by rok <ro...@gmail.com> on 2017/04/28 08:42:33 UTC, 1 replies.
- Securing Spark Job on Cluster - posted by Shashi Vishwakarma <sh...@gmail.com> on 2017/04/28 12:45:47 UTC, 7 replies.
- Exactly-once semantics with kakfa CanCommitOffsets.commitAsync? - posted by David Rosenstrauch <da...@gmail.com> on 2017/04/28 15:29:25 UTC, 3 replies.
- Spark user list seems to be rejecting/ignoring my emails from other subscribed address - posted by David Rosenstrauch <da...@gmail.com> on 2017/04/28 15:33:57 UTC, 0 replies.
- Could any one please tell me why this takes forever to finish? - posted by Yuan Fang <yf...@advisorsoftware.com> on 2017/04/28 22:35:55 UTC, 0 replies.
- Spark repartition question... - posted by Muthu Jayakumar <ba...@gmail.com> on 2017/04/30 07:07:51 UTC, 0 replies.
- parquet optimal file structure - flat vs nested - posted by Zeming Yu <ze...@gmail.com> on 2017/04/30 08:19:03 UTC, 5 replies.
- Recommended cluster parameters - posted by rakesh sharma <ra...@hotmail.com> on 2017/04/30 08:26:05 UTC, 2 replies.
- examples of dealing with nested parquet/ dataframe file - posted by Zeming Yu <ze...@gmail.com> on 2017/04/30 13:08:41 UTC, 0 replies.