You are viewing a plain text version of this content. The canonical link for it is here.
- Re: eager? in dataframe's checkpoint - posted by Koert Kuipers <ko...@tresata.com> on 2017/02/01 00:16:03 UTC, 1 replies.
- Dataset Question: No Encoder found for Set[(scala.Long, scala.Long)] - posted by Jerry Lam <ch...@gmail.com> on 2017/02/01 00:33:55 UTC, 4 replies.
- Re: Resource Leak in Spark Streaming - posted by "Shixiong(Ryan) Zhu" <sh...@databricks.com> on 2017/02/01 01:28:33 UTC, 5 replies.
- Re: JavaRDD text matadata(file name) findings - posted by Hyukjin Kwon <gu...@gmail.com> on 2017/02/01 02:45:46 UTC, 1 replies.
- Re: Converting timezones in Spark - posted by Don Drake <do...@gmail.com> on 2017/02/01 03:31:10 UTC, 0 replies.
- Parameterized types and Datasets - Spark 2.1.0 - posted by Don Drake <do...@gmail.com> on 2017/02/01 04:12:10 UTC, 4 replies.
- Re: does both below code do the same thing? I had to refactor code to fit in spark-sql - posted by Alex <si...@gmail.com> on 2017/02/01 05:38:17 UTC, 0 replies.
- Hive Java UDF running on spark-sql issue - posted by Alex <si...@gmail.com> on 2017/02/01 05:56:45 UTC, 2 replies.
- RE: Jars directory in Spark 2.0 - posted by Sidney Feiner <si...@startapp.com> on 2017/02/01 07:23:58 UTC, 2 replies.
- A question about inconsistency during dataframe creation with RDD/dict in PySpark - posted by Han-Cheol Cho <ha...@nhn-techorus.com> on 2017/02/01 10:25:07 UTC, 0 replies.
- Question about Multinomial LogisticRegression in spark mllib in spark 2.1.0 - posted by Aseem Bansal <as...@gmail.com> on 2017/02/01 11:42:14 UTC, 2 replies.
- Re: tylerchapman@yahoo-inc.com is no longer with Yahoo! (was: Question about Multinomial LogisticRegression in spark mllib in spark 2.1.0) - posted by Aseem Bansal <as...@gmail.com> on 2017/02/01 11:45:05 UTC, 0 replies.
- union of compatible types - posted by Koert Kuipers <ko...@tresata.com> on 2017/02/01 16:02:00 UTC, 0 replies.
- using withWatermark on Dataset - posted by Jerry Lam <ch...@gmail.com> on 2017/02/01 18:38:49 UTC, 1 replies.
- pivot over non numerical data - posted by Darshan Pandya <da...@gmail.com> on 2017/02/01 20:02:51 UTC, 2 replies.
- Re: increasing cross join speed - posted by Takeshi Yamamuro <li...@gmail.com> on 2017/02/02 06:18:04 UTC, 0 replies.
- Closing resources in the executor - posted by Appu K <ku...@gmail.com> on 2017/02/02 06:28:27 UTC, 1 replies.
- FP growth - Items in a transaction must be unique - posted by "Devi P.V" <de...@gmail.com> on 2017/02/02 07:17:22 UTC, 1 replies.
- Is it okay to run Hive Java UDFS in Spark-sql. Anybody's still doing it? - posted by Alex <si...@gmail.com> on 2017/02/02 08:33:29 UTC, 1 replies.
- filters Pushdown - posted by Peter Sg <pe...@varonis.com> on 2017/02/02 09:41:44 UTC, 4 replies.
- Suprised!!!!!Spark-shell showing inconsistent results - posted by Alex <si...@gmail.com> on 2017/02/02 10:03:08 UTC, 3 replies.
- Running a spark code on multiple machines using google cloud platform - posted by Anahita Talebi <an...@gmail.com> on 2017/02/02 12:29:58 UTC, 2 replies.
- frustration with field names in Dataset - posted by Koert Kuipers <ko...@tresata.com> on 2017/02/02 15:39:26 UTC, 3 replies.
- [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext - posted by Hollin Wilkins <ho...@combust.ml> on 2017/02/02 16:42:10 UTC, 15 replies.
- Re: HBase Spark - posted by Benjamin Kim <bb...@gmail.com> on 2017/02/02 18:28:52 UTC, 9 replies.
- Re: Dynamic resource allocation to Spark on Mesos - posted by Ji Yan <ji...@drive.ai> on 2017/02/02 20:41:14 UTC, 12 replies.
- Spark 2 - Creating datasets from dataframes with extra columns - posted by Don Drake <do...@gmail.com> on 2017/02/02 20:46:04 UTC, 3 replies.
- Spark 2 + Java + UDF + unknown return type... - posted by Jean Georges Perrin <jg...@jgp.net> on 2017/02/02 21:05:39 UTC, 2 replies.
- Spark: Scala Shell Very Slow (Unresponsive) - posted by jimitkr <ji...@softpath.net> on 2017/02/02 21:34:08 UTC, 1 replies.
- 4 days left to submit your abstract to Spark Summit SF - posted by Scott walent <sc...@gmail.com> on 2017/02/02 22:28:58 UTC, 0 replies.
- persistence iops and throughput check? Re: Running a spark code on multiple machines using google cloud platform - posted by Heji Kim <hs...@gmail.com> on 2017/02/03 00:50:27 UTC, 0 replies.
- dataset algos slow because of too many shuffles - posted by Koert Kuipers <ko...@tresata.com> on 2017/02/03 05:49:14 UTC, 0 replies.
- saveToCassandra issue. Please help - posted by shyla deshpande <de...@gmail.com> on 2017/02/03 08:45:22 UTC, 2 replies.
- Bipartite projection with Graphx - posted by balaji9058 <ks...@gmail.com> on 2017/02/03 10:24:31 UTC, 0 replies.
- problem with the method JavaDStream.foreachRDD() SparkStreaming - posted by Hamza HACHANI <ha...@supcom.tn> on 2017/02/03 13:45:00 UTC, 0 replies.
- Is DoubleWritable and DoubleObjectInspector doing the same thing in Hive UDF? - posted by Alex <si...@gmail.com> on 2017/02/03 14:49:25 UTC, 1 replies.
- NoNodeAvailableException (None of the configured nodes are available) error when trying to push data to Elastic from a Spark job - posted by Dmitry Goldenberg <dg...@gmail.com> on 2017/02/03 18:10:09 UTC, 4 replies.
- sqlContext vs spark. - posted by "☼ R Nair (रविशंकर नायर)" <ra...@gmail.com> on 2017/02/03 18:48:25 UTC, 1 replies.
- Spark submit on yarn does not return with exit code 1 on exception - posted by Shashank Mandil <ma...@gmail.com> on 2017/02/03 19:06:58 UTC, 4 replies.
- Re: How do I dynamically add nodes to spark standalone cluster and be able to discover them? - posted by kant kodali <ka...@gmail.com> on 2017/02/04 00:57:18 UTC, 1 replies.
- can I use Spark Standalone with HDFS but no YARN - posted by kant kodali <ka...@gmail.com> on 2017/02/04 06:08:17 UTC, 3 replies.
- Re: spark architecture question -- Pleas Read - posted by Mich Talebzadeh <mi...@gmail.com> on 2017/02/04 08:06:46 UTC, 4 replies.
- java.lang.NoSuchMethodError: scala.runtime.ObjectRef.zero()Lscala/runtime/ObjectRef - posted by sathyanarayanan mudhaliyar <sa...@gmail.com> on 2017/02/04 09:24:37 UTC, 1 replies.
- NullPointerException while joining two avro Hive tables - posted by Понькин Алексей <al...@ya.ru> on 2017/02/04 10:30:05 UTC, 0 replies.
- specifing schema on dataframe - posted by Sam Elamin <hu...@gmail.com> on 2017/02/04 13:46:00 UTC, 17 replies.
- Re: How to checkpoint and RDD after a stage and before reaching an action? - posted by Koert Kuipers <ko...@tresata.com> on 2017/02/04 16:26:55 UTC, 0 replies.
- SSpark streaming: Could not initialize class kafka.consumer.FetchRequestAndResponseStatsRegistry$ - posted by Mich Talebzadeh <mi...@gmail.com> on 2017/02/04 20:33:39 UTC, 3 replies.
- Mismatched datatype in Case statement - posted by Aviral Agarwal <av...@gmail.com> on 2017/02/04 20:50:16 UTC, 0 replies.
- Turning rows into columns - posted by Paul Tremblay <pa...@gmail.com> on 2017/02/04 21:25:24 UTC, 2 replies.
- High Availability/DR options for Spark applications - posted by Ashok Kumar <as...@yahoo.com.INVALID> on 2017/02/05 09:11:15 UTC, 2 replies.
- using an alternative slf4j implementation - posted by "Mendelson, Assaf" <As...@rsa.com> on 2017/02/05 14:53:56 UTC, 5 replies.
- Unsubscribe - posted by satish saley <sa...@gmail.com> on 2017/02/05 21:24:57 UTC, 1 replies.
- FileNotFoundException, while file is actually available - posted by Evgenii Morozov <ev...@gmail.com> on 2017/02/05 21:33:23 UTC, 0 replies.
- Invalid checkpoint file on spark 1.6.2 - posted by zitang qin <zi...@gmail.com> on 2017/02/05 22:45:08 UTC, 0 replies.
- Cannot read Hive Views in Spark SQL - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2017/02/06 01:19:05 UTC, 8 replies.
- How to specify "verbose GC" in Spark submit? - posted by "Md. Rezaul Karim" <re...@insight-centre.org> on 2017/02/06 13:02:25 UTC, 2 replies.
- PCA slow in comparison with single-threaded R version - posted by Marek Wiewiorka <ma...@gmail.com> on 2017/02/06 15:59:04 UTC, 0 replies.
- [Structured Streaming] Using File Sink to store to hive table. - posted by Egor Pahomov <pa...@gmail.com> on 2017/02/06 19:39:08 UTC, 11 replies.
- Spark mapPartition output object size coming larger than expected - posted by nitinkak001 <ni...@gmail.com> on 2017/02/06 20:21:49 UTC, 0 replies.
- wholeTextFiles fails, but textFile succeeds for same path - posted by Paul Tremblay <pa...@gmail.com> on 2017/02/06 22:35:27 UTC, 5 replies.
- How to get a spark sql statement implement duration ? - posted by Mars Xu <xu...@gmail.com> on 2017/02/07 03:17:08 UTC, 2 replies.
- Launching an Spark application in a subset of machines - posted by Alvaro Brandon <al...@gmail.com> on 2017/02/07 10:20:52 UTC, 6 replies.
- [Spark Context]: How to add on demand jobs to an existing spark context? - posted by Cosmin Posteuca <co...@gmail.com> on 2017/02/07 13:37:40 UTC, 11 replies.
- Fault tolerant broadcast in updateStateByKey - posted by Amit Sela <am...@gmail.com> on 2017/02/07 16:12:32 UTC, 4 replies.
- submit a spark code on google cloud - posted by Anahita Talebi <an...@gmail.com> on 2017/02/07 16:33:30 UTC, 3 replies.
- No topicDistributions(..) method in ml.clustering.LocalLDAModel - posted by sachintyagi22 <sa...@gmail.com> on 2017/02/07 17:03:03 UTC, 0 replies.
- Re: About saving a model file - posted by durgaswaroop <du...@oracle.com> on 2017/02/07 17:20:14 UTC, 0 replies.
- Spark streaming question - SPARK-13758 Need to use an external RDD inside DStream processing...Please help - posted by shyla deshpande <de...@gmail.com> on 2017/02/07 18:28:16 UTC, 2 replies.
- Un-exploding / denormalizing Spark SQL help - posted by Everett Anderson <ev...@nuna.com.INVALID> on 2017/02/07 19:02:28 UTC, 8 replies.
- Exception in spark streaming + kafka direct app - posted by Srikanth <sr...@gmail.com> on 2017/02/07 21:34:29 UTC, 2 replies.
- does persistence required for single action ? - posted by Shushant Arora <sh...@gmail.com> on 2017/02/08 02:09:30 UTC, 2 replies.
- [Spark 2.0.0] java.util.concurrent.TimeoutException while writing to mongodb from Spark - posted by Palash Gupta <sp...@yahoo.com.INVALID> on 2017/02/08 06:36:14 UTC, 0 replies.
- rdd save to orc file happened problems - posted by "446463844@qq.com" <44...@qq.com> on 2017/02/08 10:05:53 UTC, 0 replies.
- Dataset count on database or parquet - posted by Rohit Verma <ro...@rokittech.com> on 2017/02/08 10:58:53 UTC, 1 replies.
- JavaBean serialization with cyclic bean attributes - posted by Pascal Stammer <st...@deichbrise.de> on 2017/02/08 11:26:29 UTC, 0 replies.
- [Spark-SQL] Hive support is required to select over the following tables - posted by Daniel Haviv <da...@gmail.com> on 2017/02/08 13:13:36 UTC, 1 replies.
- FINAL REMINDER: CFP for ApacheCon closes February 11th - posted by Rich Bowen <rb...@apache.org> on 2017/02/08 14:09:58 UTC, 0 replies.
- Cluster to Cluster communication - posted by Vasu Gourabathina <vg...@gmail.com> on 2017/02/08 14:50:32 UTC, 0 replies.
- Spark's execution plan debugging - posted by Swapnil Shinde <sw...@gmail.com> on 2017/02/08 17:02:41 UTC, 4 replies.
- Issues launching job dynamically in Java - posted by yohann jardin <yo...@hotmail.com> on 2017/02/08 17:03:39 UTC, 0 replies.
- Union of DStream and RDD - posted by Amit Sela <am...@gmail.com> on 2017/02/08 20:32:56 UTC, 4 replies.
- Spark 2.0 Scala 2.11 and Kafka 0.10 Scala 2.10 - posted by "Uwe@Moosheimer.com" <Uw...@Moosheimer.com> on 2017/02/08 22:30:30 UTC, 1 replies.
- Spark 2.0.2 ML code fail - posted by Manish Tripathi <tr...@gmail.com> on 2017/02/08 22:41:11 UTC, 0 replies.
- Structured Streaming. S3 To Google BigQuery - posted by Sam Elamin <hu...@gmail.com> on 2017/02/08 23:54:47 UTC, 0 replies.
- Does Spark consider the free space of hard drive of the data nodes? - posted by Benyi Wang <be...@gmail.com> on 2017/02/09 00:32:14 UTC, 0 replies.
- Strange behavior with 'not' and filter pushdown - posted by Alexi Kostibas <ak...@nuna.com.INVALID> on 2017/02/09 00:56:58 UTC, 4 replies.
- [Spark 2.1.0] Spark SQL return correct count, but NULL on all fields - posted by Babak Alipour <ba...@gmail.com> on 2017/02/09 02:53:11 UTC, 0 replies.
- Counting things in Spark Structured Streaming - posted by Timothy Chan <tc...@lumoslabs.com> on 2017/02/09 03:10:23 UTC, 1 replies.
- Spark stream parallel streaming - posted by Udbhav Agarwal <ud...@syncoms.com> on 2017/02/09 05:37:06 UTC, 0 replies.
- MultiLabelBinarizer - posted by Madabhattula Rajesh Kumar <mr...@gmail.com> on 2017/02/09 05:37:54 UTC, 1 replies.
- [ANNOUNCE] Apache SystemML 0.12.0-incubating released. - posted by Arvind Surve <ac...@yahoo.com.INVALID> on 2017/02/09 05:53:07 UTC, 0 replies.
- Practical configuration to run LSH in Spark 2.1.0 - posted by nguyen duc Tuan <ne...@gmail.com> on 2017/02/09 07:55:31 UTC, 12 replies.
- java-lang-noclassdeffounderror-org-apache-spark-streaming-api-java-javastreamin - posted by sathyanarayanan mudhaliyar <sa...@gmail.com> on 2017/02/09 09:24:34 UTC, 0 replies.
- Updating variable in foreachRDD - posted by "Mendelson, Assaf" <As...@rsa.com> on 2017/02/09 10:27:13 UTC, 0 replies.
- Re: Performance bug in UDAF? - posted by Spark User <sp...@gmail.com> on 2017/02/09 21:25:53 UTC, 0 replies.
- Question about best Spark tuning - posted by Ji Yan <ji...@drive.ai> on 2017/02/09 22:11:56 UTC, 1 replies.
- Driver hung and happend out of memory while writing to console progress bar - posted by John Fang <xi...@alibaba-inc.com> on 2017/02/10 04:35:31 UTC, 2 replies.
- 回复:Driver hung and happend out of memory while writing to console progress bar - posted by John Fang <xi...@alibaba-inc.com> on 2017/02/10 04:41:57 UTC, 0 replies.
- Is it better to Use Java or Python on Scala for Spark for using big data sets - posted by nancy henry <na...@gmail.com> on 2017/02/10 05:59:00 UTC, 3 replies.
- From C* to DataFrames with JSON - posted by Jean-Francois Gosselin <jf...@gmail.com> on 2017/02/10 06:12:29 UTC, 1 replies.
- Add hive-site.xml at runtime - posted by Shivam Sharma <28...@gmail.com> on 2017/02/10 06:21:23 UTC, 3 replies.
- Re: odd caching behavior or accounting - posted by Hbf <Ka...@dreizak.com> on 2017/02/10 06:21:31 UTC, 0 replies.
- Write JavaDStream to Kafka (how?) - posted by "Gutwein, Sebastian" <gu...@mail.hs-ulm.de> on 2017/02/10 10:08:20 UTC, 1 replies.
- HDFS Shell tool - posted by Vitásek, Ladislav <vi...@avast.com> on 2017/02/10 10:39:58 UTC, 0 replies.
- SQL warehouse dir - posted by Joseph Naegele <jn...@grierforensics.com> on 2017/02/10 16:30:06 UTC, 0 replies.
- Getting exit code of pipe() - posted by Xuchen Yao <ya...@gmail.com> on 2017/02/10 19:18:04 UTC, 3 replies.
- EC2 script is missing in Spark 2.0.0~2.1.0 - posted by "Md. Rezaul Karim" <re...@insight-centre.org> on 2017/02/11 12:34:16 UTC, 4 replies.
- Disable Spark SQL Optimizations for unit tests - posted by Stefan Ackermann <st...@zuehlke.com> on 2017/02/11 17:46:49 UTC, 1 replies.
- Case class with POJO - encoder issues - posted by Jason White <ja...@shopify.com> on 2017/02/12 00:19:26 UTC, 1 replies.
- Remove dependence on HDFS - posted by Benjamin Kim <bb...@gmail.com> on 2017/02/12 04:28:58 UTC, 5 replies.
- is dataframe thread safe? - posted by "Mendelson, Assaf" <As...@rsa.com> on 2017/02/12 08:06:28 UTC, 15 replies.
- Etl with spark - posted by Sam Elamin <hu...@gmail.com> on 2017/02/12 11:04:19 UTC, 4 replies.
- Repartition function duplicates data - posted by "F. Amara" <fa...@wso2.com> on 2017/02/13 05:46:58 UTC, 0 replies.
- How to measure IO time in Spark over S3 - posted by Gili Nachum <gi...@gmail.com> on 2017/02/13 06:55:16 UTC, 1 replies.
- Lost executor 4 Container killed by YARN for exceeding memory limits. - posted by nancy henry <na...@gmail.com> on 2017/02/13 10:27:45 UTC, 4 replies.
- Order of rows not preserved after cache + count + coalesce - posted by "David Haglund (external)" <Da...@husqvarnagroup.com> on 2017/02/13 12:51:48 UTC, 2 replies.
- Does Spark support heavy duty third party libraries? - posted by bhayes <Br...@informatica.com> on 2017/02/13 13:59:35 UTC, 0 replies.
- [Spark Launcher] How to launch parallel jobs? - posted by Cosmin Posteuca <co...@gmail.com> on 2017/02/13 15:05:13 UTC, 6 replies.
- Parquet Gzipped Files - posted by Benjamin Kim <bb...@gmail.com> on 2017/02/13 17:48:18 UTC, 2 replies.
- using spark-xml_2.10 to extract data from XML file - posted by "Carlo.Allocca" <ca...@open.ac.uk> on 2017/02/13 18:17:20 UTC, 4 replies.
- Re: Spark 2.1.0 issue with spark-shell and pyspark - posted by jerrytim <je...@126.com> on 2017/02/13 22:23:04 UTC, 1 replies.
- How to specify default value for StructField? - posted by vbegar <ve...@hpe.com> on 2017/02/13 22:54:32 UTC, 7 replies.
- Handling Skewness and Heterogeneity - posted by Anis Nasir <aa...@gmail.com> on 2017/02/14 08:01:53 UTC, 3 replies.
- wholeTextfiles not parallel, runs out of memory - posted by Henry Tremblay <pa...@gmail.com> on 2017/02/14 08:36:55 UTC, 4 replies.
- fault tolerant dataframe write with overwrite - posted by "Mendelson, Assaf" <As...@rsa.com> on 2017/02/14 10:22:06 UTC, 5 replies.
- Different Results When Performing PCA with Spark and R - posted by Amlan Jyoti <am...@tcs.com> on 2017/02/14 10:45:43 UTC, 0 replies.
- how to fix the order of data - posted by 萝卜丝炒饭 <14...@qq.com> on 2017/02/14 11:41:55 UTC, 0 replies.
- Re: how to fix the order of data - posted by Sam Elamin <hu...@gmail.com> on 2017/02/14 11:54:13 UTC, 1 replies.
- NoSuchMethodException: org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions writing to Hive - posted by nimrodo <ni...@veracity-group.com> on 2017/02/14 14:01:57 UTC, 0 replies.
- HiveContext on Spark 1.6 Linkage Error:ClassCastException - posted by Enrico DUrso <en...@everis.com> on 2017/02/14 14:17:26 UTC, 1 replies.
- Dealing with missing columns in SPARK SQL in JSON - posted by Aseem Bansal <as...@gmail.com> on 2017/02/14 14:30:35 UTC, 3 replies.
- Reusing HBase connection in transformations - posted by DandyDev <de...@gmail.com> on 2017/02/14 15:32:00 UTC, 0 replies.
- My spark job runs faster in spark 1.6 and much slower in spark 2.0 - posted by anatva <ar...@gmail.com> on 2017/02/14 20:25:06 UTC, 3 replies.
- streaming-kafka-0-8-integration (direct approach) and monitoring - posted by Mohammad Kargar <mk...@phemi.com> on 2017/02/15 00:03:16 UTC, 6 replies.
- PySpark: use one column to index another (udf of two columns?) - posted by apu <ap...@gmail.com> on 2017/02/15 00:11:32 UTC, 0 replies.
- RE: Can't load a RandomForestClassificationModel in Spark job - posted by Jianhong Xia <jx...@infoblox.com> on 2017/02/15 00:46:18 UTC, 3 replies.
- Spark Thrift Server - Skip header when load data from local file - posted by kumar r <ku...@gmail.com> on 2017/02/15 05:01:24 UTC, 1 replies.
- Spark executor memory and jvm heap memory usage metric - posted by satishl <sa...@gmail.com> on 2017/02/15 09:33:41 UTC, 0 replies.
- What is the practical use of "Peak Execution Memory" in Spark App Resource tuning - posted by satishl <sa...@gmail.com> on 2017/02/15 09:47:25 UTC, 0 replies.
- extracting eventlogs saved snappy format. - posted by satishl <sa...@gmail.com> on 2017/02/15 09:55:34 UTC, 1 replies.
- notebook connecting Spark On Yarn - posted by Sachin Aggarwal <di...@gmail.com> on 2017/02/15 11:41:10 UTC, 1 replies.
- Query data in subdirectories in Hive Partitions using Spark SQL - posted by Ahmed Kamal Abdelfatah <ah...@careem.com> on 2017/02/15 12:47:29 UTC, 2 replies.
- Latest Release of Receiver based Kafka Consumer for Spark Streaming. - posted by Dibyendu Bhattacharya <di...@gmail.com> on 2017/02/15 15:57:35 UTC, 0 replies.
- Regarding transformation with dataframe - posted by Gaurav Agarwal <ga...@gmail.com> on 2017/02/15 17:24:57 UTC, 0 replies.
- Enrichment with static tables - posted by Gaurav Agarwal <ga...@gmail.com> on 2017/02/15 18:34:02 UTC, 2 replies.
- [Spark Streaming WAL] custom java streaming receiver and the WAL - posted by "Charles O. Bajomo" <ch...@pretechconsulting.co.uk> on 2017/02/15 21:21:44 UTC, 0 replies.
- Remove .HiveStaging files - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2017/02/15 23:05:20 UTC, 1 replies.
- Spark Job Performance monitoring approaches - posted by Chetan Khatri <ch...@gmail.com> on 2017/02/16 05:15:12 UTC, 2 replies.
- Re: physical memory usage keep increasing for spark app on Yarn - posted by Yang Cao <cy...@gmail.com> on 2017/02/16 07:18:59 UTC, 0 replies.
- Pyspark: out of memory exception during model training - posted by mzaharchenko <ma...@gmail.com> on 2017/02/16 11:41:33 UTC, 0 replies.
- skewed data in join - posted by Gourav Sengupta <go...@gmail.com> on 2017/02/16 16:11:48 UTC, 4 replies.
- Pretty print a dataframe... - posted by Muthu Jayakumar <ba...@gmail.com> on 2017/02/16 16:26:40 UTC, 2 replies.
- scala.io.Source.fromFile protocol for hadoop - posted by nancy henry <na...@gmail.com> on 2017/02/16 16:54:18 UTC, 0 replies.
- Will Spark ever run the same task at the same time - posted by Ji Yan <ji...@drive.ai> on 2017/02/16 18:34:14 UTC, 2 replies.
- Latent Dirichlet Allocation in Spark - posted by Manish Tripathi <tr...@gmail.com> on 2017/02/16 18:36:30 UTC, 0 replies.
- Spark on Mesos with Docker in bridge networking mode - posted by cherryii <ch...@adobe.com> on 2017/02/16 19:00:09 UTC, 1 replies.
- Debugging Spark application - posted by "Md. Rezaul Karim" <re...@insight-centre.org> on 2017/02/16 22:00:10 UTC, 2 replies.
- Spark standalone cluster on EC2 error .. Checkpoint.. - posted by shyla deshpande <de...@gmail.com> on 2017/02/16 22:40:10 UTC, 2 replies.
- Spark Worker can't find jar submitted programmatically - posted by jeremycod <zo...@gmail.com> on 2017/02/17 00:46:19 UTC, 1 replies.
- how to give hdfs file path as argument to spark-submit - posted by nancy henry <na...@gmail.com> on 2017/02/17 09:12:58 UTC, 1 replies.
- How to convert RDD to DF for this case - - posted by Aakash Basu <aa...@gmail.com> on 2017/02/17 09:37:03 UTC, 3 replies.
- Graphx Examples for ALS - posted by balaji9058 <ks...@gmail.com> on 2017/02/17 13:07:43 UTC, 1 replies.
- I am not sure why I am getting java.lang.NoClassDefFoundError - posted by kant kodali <ka...@gmail.com> on 2017/02/17 17:40:19 UTC, 3 replies.
- Executor tab values in Spark Application UI - posted by satishl <sa...@gmail.com> on 2017/02/17 23:00:17 UTC, 1 replies.
- How do I increase readTimeoutMillis parameter in Spark-shell? - posted by kant kodali <ka...@gmail.com> on 2017/02/17 23:28:08 UTC, 0 replies.
- question on SPARK_WORKER_CORES - posted by kant kodali <ka...@gmail.com> on 2017/02/18 00:55:46 UTC, 7 replies.
- Serialization error - sql UDF related - posted by Darshan Pandya <da...@gmail.com> on 2017/02/18 03:36:28 UTC, 2 replies.
- Class Cast Exception while read from GS and write to S3.I feel gettng while writeing to s3. - posted by Manohar753 <ma...@happiestminds.com> on 2017/02/18 10:47:50 UTC, 0 replies.
- Avalance of warnings trying to read Spark 1.6.X Parquet into Spark 2.X - posted by Stephen Boesch <ja...@gmail.com> on 2017/02/18 19:50:12 UTC, 1 replies.
- Efficient Spark-Sql queries when only nth Column changes - posted by Patrick <ti...@gmail.com> on 2017/02/18 21:23:22 UTC, 3 replies.
- [Spark Streaming] Starting Spark Streaming application from a specific position in Kinesis stream - posted by Neil Maheshwari <ne...@gmail.com> on 2017/02/19 09:28:51 UTC, 7 replies.
- Spark streaming on AWS EC2 error . Please help - posted by shyla deshpande <de...@gmail.com> on 2017/02/20 08:23:17 UTC, 0 replies.
- Basic Grouping Question - posted by Marco Mans <ma...@telemans.de> on 2017/02/20 11:23:50 UTC, 1 replies.
- Message loss in streaming even with graceful shutdown - posted by Noorul Islam K M <no...@noorul.com> on 2017/02/20 12:04:34 UTC, 0 replies.
- Why does Spark Streaming application with Kafka fail with “requirement failed: numRecords must not be negative”? - posted by Muhammad Haseeb Javed <11...@seecs.edu.pk> on 2017/02/20 18:06:59 UTC, 6 replies.
- [SparkSQL] pre-check syntex before running spark job? - posted by Linyuxin <li...@huawei.com> on 2017/02/21 03:44:32 UTC, 4 replies.
- ClassLoader problem - java.io.InvalidClassException: scala.Option; local class incompatible - posted by Kohki Nishio <ta...@gmail.com> on 2017/02/21 06:36:37 UTC, 1 replies.
- please send me pom.xml for scala 2.10 - posted by nancy henry <na...@gmail.com> on 2017/02/21 09:06:04 UTC, 0 replies.
- How to query a query with not contain, not start_with, not end_with condition effective? - posted by Chanh Le <gi...@gmail.com> on 2017/02/21 09:56:37 UTC, 7 replies.
- Error when trying to filter - posted by Marco Mans <ma...@telemans.de> on 2017/02/21 11:43:26 UTC, 0 replies.
- CSV DStream to Hive - posted by nimrodo <ni...@veracity-group.com> on 2017/02/21 22:12:50 UTC, 1 replies.
- 答复: [SparkSQL] pre-check syntex before running spark job? - posted by Linyuxin <li...@huawei.com> on 2017/02/22 02:04:19 UTC, 1 replies.
- Spark executors in streaming app always uses 2 executors - posted by satishl <sa...@gmail.com> on 2017/02/22 05:37:27 UTC, 1 replies.
- Spark SQL : Join operation failure - posted by jatinpreet <ja...@gmail.com> on 2017/02/22 06:11:04 UTC, 2 replies.
- spark sql: full outer join optimization - posted by Hongdi Ren <ry...@gmail.com> on 2017/02/22 06:32:04 UTC, 1 replies.
- [ANNOUNCE] Apache Bahir 2.1.0 Released - posted by Christian Kadner <ck...@apache.org> on 2017/02/22 12:07:41 UTC, 0 replies.
- Executor links in Job History - posted by yohann jardin <yo...@hotmail.com> on 2017/02/22 17:25:01 UTC, 0 replies.
- Re: Spark Streaming: Using external data during stream transformation - posted by Abhisheks <sm...@gmail.com> on 2017/02/22 18:02:41 UTC, 0 replies.
- Spark Streaming - parallel recovery - posted by Dominik Safaric <do...@gmail.com> on 2017/02/22 21:49:44 UTC, 0 replies.
- RDD blocks on Spark Driver - posted by pr...@gmail.com on 2017/02/23 01:02:46 UTC, 3 replies.
- Why spark history server does not show RDD even if it is persisted? - posted by Parag Chaudhari <pa...@gmail.com> on 2017/02/23 01:51:39 UTC, 6 replies.
- DataframeWriter - How to change filename extension - posted by Nirav Patel <np...@xactlycorp.com> on 2017/02/23 03:53:15 UTC, 0 replies.
- Is there any limit on number of tasks per stage attempt? - posted by Parag Chaudhari <pa...@gmail.com> on 2017/02/23 04:00:56 UTC, 2 replies.
- Is there a list of missing optimizations for typed functions? - posted by Justin Pihony <ju...@gmail.com> on 2017/02/23 06:52:31 UTC, 2 replies.
- quick question: best to use cluster mode or client mode for production? - posted by nancy henry <na...@gmail.com> on 2017/02/23 08:53:02 UTC, 1 replies.
- unsubscribe - posted by Donam Kim <ss...@gmail.com> on 2017/02/23 09:24:51 UTC, 2 replies.
- Support for decimal separator (comma or period) in spark 2.1 - posted by Arkadiusz Bicz <ar...@gmail.com> on 2017/02/23 10:14:10 UTC, 0 replies.
- Scala functions for dataframes - posted by Advait Mohan Raut <ad...@essexlg.com> on 2017/02/23 10:16:27 UTC, 0 replies.
- Spark join over sorted columns of dataset. - posted by Rohit Verma <ro...@rokittech.com> on 2017/02/23 10:17:46 UTC, 0 replies.
- New Amazon AMIs for EC2 script - posted by in4maniac <sa...@skimlinks.com> on 2017/02/23 12:23:12 UTC, 2 replies.
- Structured Streaming: How to handle bad input - posted by JayeshLalwani <Ja...@capitalone.com> on 2017/02/23 14:09:26 UTC, 2 replies.
- [Spark Streaming] Batch versus streaming - posted by "Charles O. Bajomo" <ch...@pretechconsulting.co.uk> on 2017/02/23 14:20:25 UTC, 0 replies.
- Shuffling on Dataframe to RDD conversion with a map transformation - posted by Patrick <ti...@gmail.com> on 2017/02/23 14:21:10 UTC, 1 replies.
- Get S3 Parquet File - posted by Benjamin Kim <bb...@gmail.com> on 2017/02/23 18:23:42 UTC, 9 replies.
- Spark executor on Docker runs as root - posted by Ji Yan <ji...@drive.ai> on 2017/02/23 19:33:21 UTC, 0 replies.
- Spark: Continuously reading data from Cassandra - posted by Tech Id <te...@gmail.com> on 2017/02/23 23:30:05 UTC, 0 replies.
- Apache Spark MLIB - posted by Mina Aslani <as...@gmail.com> on 2017/02/24 02:19:47 UTC, 1 replies.
- Fwd: Duplicate Rank for within same partitions - posted by Dana Ram Meghwal <da...@saavn.com> on 2017/02/24 07:08:08 UTC, 1 replies.
- care to share latest pom forspark scala applications eclipse? - posted by nancy henry <na...@gmail.com> on 2017/02/24 08:16:02 UTC, 1 replies.
- Duplicate Rank within same Partitions - posted by Dana Ram Meghwal <da...@saavn.com> on 2017/02/24 09:16:54 UTC, 0 replies.
- instrumenting Spark hit ratios - posted by Mich Talebzadeh <mi...@gmail.com> on 2017/02/25 10:31:03 UTC, 0 replies.
- Spark runs out of memory with small file - posted by Henry Tremblay <pa...@gmail.com> on 2017/02/25 19:33:18 UTC, 11 replies.
- No main class set in JAR; please specify one with --class and java.lang.ClassNotFoundException - posted by Raymond Xie <xi...@gmail.com> on 2017/02/25 20:50:05 UTC, 8 replies.
- PySpark + virtualenv: Using a different python path on the driver and on the executors - posted by Tomer Benyamini <to...@gmail.com> on 2017/02/25 20:50:34 UTC, 0 replies.
- pyspark in intellij - posted by Stephen Boesch <ja...@gmail.com> on 2017/02/26 01:55:52 UTC, 1 replies.
- Spark SQL table authority control? - posted by 李斌松 <li...@gmail.com> on 2017/02/26 03:50:09 UTC, 0 replies.
- Spark test error in ProactiveClosureSerializationSuite.scala - posted by 白也诗无敌 <44...@qq.com> on 2017/02/26 07:05:33 UTC, 0 replies.
- In Spark streaming, will saved kafka offsets become invalid if I change the number of partitions in a kafka topic? - posted by shyla deshpande <de...@gmail.com> on 2017/02/26 07:10:53 UTC, 1 replies.
- attempting to map Dataset[Row] - posted by Stephen Fletcher <st...@gmail.com> on 2017/02/26 12:31:19 UTC, 2 replies.
- Saving Structured Streaming DF to Hive Partitioned table - posted by nimrodo <ni...@veracity-group.com> on 2017/02/26 13:46:48 UTC, 0 replies.
- Re: Kafka Streaming and partitioning - posted by tonyye <an...@gmail.com> on 2017/02/26 15:09:23 UTC, 0 replies.
- Custom log4j.properties on AWS EMR - posted by Prithish <pr...@gmail.com> on 2017/02/26 16:31:55 UTC, 3 replies.
- Are we still dependent on Guava jar in Spark 2.1.0 as well? - posted by kant kodali <ka...@gmail.com> on 2017/02/26 20:00:17 UTC, 0 replies.
- SPark - YARN Cluster Mode - posted by ayan guha <gu...@gmail.com> on 2017/02/27 00:52:25 UTC, 3 replies.
- 回复:Spark SQL table authority control? - posted by "yuyong.zhai" <yu...@ele.me> on 2017/02/27 02:19:53 UTC, 0 replies.
- java.io.NotSerializableException: org.apache.spark.streaming.StreamingContext - posted by lk_spark <lk...@163.com> on 2017/02/27 03:14:00 UTC, 0 replies.
- Getting unrecoverable exception: java.lang.NullPointerException when trying to find wordcount in kafka topic - posted by Mina Aslani <as...@gmail.com> on 2017/02/27 04:00:43 UTC, 0 replies.
- Thrift server does not respect hive.server2.enable.doAs=true - posted by "yuyong.zhai" <yu...@ele.me> on 2017/02/27 06:04:33 UTC, 0 replies.
- How to do multiple join in pyspark - posted by lovemoon <zt...@163.com> on 2017/02/27 07:39:09 UTC, 0 replies.
- handling dependency conflicts with spark - posted by "Mendelson, Assaf" <As...@rsa.com> on 2017/02/27 10:57:08 UTC, 0 replies.
- How to set hive configs in Spark 2.1? - posted by SRK <sw...@gmail.com> on 2017/02/27 15:30:35 UTC, 2 replies.
- [Spark 2.1.0 ML] Serializing/Deserializing LocalLDA Problem - posted by Benjamin Edwards <bj...@gmail.com> on 2017/02/27 18:15:07 UTC, 0 replies.
- spark.speculation setting support on standalone mode? - posted by satishl <sa...@gmail.com> on 2017/02/27 20:42:43 UTC, 1 replies.
- [Spark Kafka] API Doc pages for Kafka 0.10 not current - posted by "Afshartous, Nick" <na...@wbgames.com> on 2017/02/27 21:01:38 UTC, 1 replies.
- using spark to load a data warehouse in real time - posted by Adaryl Wakefield <ad...@hotmail.com> on 2017/02/28 00:18:28 UTC, 6 replies.
- Error while enabling Hive Support in Spark 2.1 - posted by SRK <sw...@gmail.com> on 2017/02/28 01:15:36 UTC, 0 replies.
- Run spark machine learning example on Yarn failed - posted by Yunjie Ji <jy...@163.com> on 2017/02/28 01:18:33 UTC, 3 replies.
- spark append files to the same hdfs dir issue for LeaseExpiredException - posted by "Triones,Deng(vip.com)" <tr...@vipshop.com> on 2017/02/28 09:35:00 UTC, 1 replies.
- 答复: spark append files to the same hdfs dir issue for LeaseExpiredException - posted by "Triones,Deng(vip.com)" <tr...@vipshop.com> on 2017/02/28 10:47:47 UTC, 1 replies.
- graph.vertices.collect().foreach(println) - posted by balaji9058 <ks...@gmail.com> on 2017/02/28 12:37:43 UTC, 0 replies.
- How to use ManualClock with Spark streaming - posted by Hemalatha A <he...@googlemail.com> on 2017/02/28 14:53:32 UTC, 0 replies.
- spark-submit question - posted by Joe Olson <jo...@outlook.com> on 2017/02/28 15:05:03 UTC, 3 replies.
- DataFrame from in memory datasets in multiple JVMs - posted by johndesuv <de...@gmail.com> on 2017/02/28 16:02:51 UTC, 3 replies.
- Register Spark UDF for use with Hive Thriftserver/Beeline - posted by "Lavelle, Shawn" <Sh...@osii.com> on 2017/02/28 16:24:43 UTC, 1 replies.
- Spark - Not contains on Spark dataframe - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2017/02/28 16:49:24 UTC, 0 replies.
- Spark Streaming problem with Yarn - posted by Amjad ALSHABANI <as...@gmail.com> on 2017/02/28 16:50:51 UTC, 0 replies.
- toDebugString Vs Spark UI - posted by Vidya Sujeet <sj...@gmail.com> on 2017/02/28 17:17:40 UTC, 0 replies.
- Does monotonically_increasing_id generates the same id even when executor fails or being evicted out of memory - posted by Lan Jiang <la...@gmail.com> on 2017/02/28 19:12:46 UTC, 0 replies.
- Why Spark cannot get the derived field of case class in Dataset? - posted by Yong Zhang <ja...@hotmail.com> on 2017/02/28 20:03:53 UTC, 0 replies.
- global_temp database not getting created in Spark 2.x - posted by SRK <sw...@gmail.com> on 2017/02/28 20:12:35 UTC, 0 replies.
- How to configure global_temp database via Spark Conf - posted by SRK <sw...@gmail.com> on 2017/02/28 21:57:08 UTC, 0 replies.
- error in kafka producer - posted by shyla deshpande <de...@gmail.com> on 2017/02/28 22:23:34 UTC, 2 replies.