user@spark.apache.org, 2016-05

You are viewing a plain text version of this content. The canonical link for it is here.

- spark.streaming.concurrentJobs parameter in Spark Streaming - posted by chandan prakash <ch...@gmail.com> on 2016/05/01 04:29:38 UTC, 9 replies.
- Re: Error in spark-xml - posted by Hyukjin Kwon <gu...@gmail.com> on 2016/05/01 08:11:09 UTC, 2 replies.
- Re: Bit(N) on create Table with MSSQLServer - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/05/01 10:23:06 UTC, 5 replies.
- Spark 1.6.1 issue fetching data via JDBC in Spark-shell - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/05/01 11:40:56 UTC, 0 replies.
- Why Non-resolvable parent POM for org.apache.spark:spark-parent_2.10:1.6.1: Could not transfer artifact org.apache:apache:pom:14 from/to central (https://repo1.maven.org/maven2): repo1.maven.org: unknown error - posted by sunday2000 <23...@qq.com> on 2016/05/01 13:19:55 UTC, 2 replies.
- Can not import KafkaProducer in spark streaming job - posted by fanooos <de...@gmail.com> on 2016/05/01 14:11:43 UTC, 3 replies.
- 回复： Why Non-resolvable parent POM for org.apache.spark:spark-parent_2.10:1.6.1:Could not transfer artifact org.apache:apache:pom:14 from/to central(https://repo1.maven.org/maven2): repo1.maven.org: unknown error - posted by sunday2000 <23...@qq.com> on 2016/05/01 15:32:36 UTC, 0 replies.
- Re: Why Non-resolvable parent POM for org.apache.spark:spark-parent_2.10:1.6.1:Could not transfer artifact org.apache:apache:pom:14 from/to central(https://repo1.maven.org/maven2): repo1.maven.org: unknown error - posted by Ted Yu <yu...@gmail.com> on 2016/05/01 15:50:28 UTC, 0 replies.
- 回复： Why Non-resolvable parent POM for org.apache.spark:spark-parent_2.10:1.6.1:Couldnot transfer artifact org.apache:apache:pom:14 from/to central(https://repo1.maven.org/maven2):repo1.maven.org: unknown error - posted by sunday2000 <23...@qq.com> on 2016/05/01 16:00:32 UTC, 1 replies.
- Re: Why Non-resolvable parent POM for org.apache.spark:spark-parent_2.10:1.6.1:Couldnot transfer artifact org.apache:apache:pom:14 from/to central(https://repo1.maven.org/maven2):repo1.maven.org: unknown error - posted by Ted Yu <yu...@gmail.com> on 2016/05/01 18:09:06 UTC, 0 replies.
- Is DataFrame randomSplit Deterministic? - posted by Brandon White <bw...@gmail.com> on 2016/05/01 20:37:30 UTC, 0 replies.
- Re: Spark on AWS - posted by Teng Qiu <te...@gmail.com> on 2016/05/02 01:54:55 UTC, 1 replies.
- using amazon STS with spark - posted by Luke Rohde <ro...@gmail.com> on 2016/05/02 02:35:26 UTC, 0 replies.
- SparkSQL with large result size - posted by Buntu Dev <bu...@gmail.com> on 2016/05/02 06:19:15 UTC, 7 replies.
- 回复： Why Non-resolvable parent POM for org.apache.spark:spark-parent_2.10:1.6.1:Couldnottransfer artifact org.apache:apache:pom:14 from/to central(https://repo1.maven.org/maven2):repo1.maven.org:unknown error - posted by sunday2000 <23...@qq.com> on 2016/05/02 08:20:07 UTC, 0 replies.
- A signature in Logging.class refers to type Logger in package org.slf4j which is not available. - posted by Kapil Raaj <ca...@gmail.com> on 2016/05/02 09:03:07 UTC, 0 replies.
- kafka direct streaming python API fromOffsets - posted by Tigran Avanesov <ti...@olamobile.com> on 2016/05/02 12:54:56 UTC, 3 replies.
- REST API submission and Application ID - posted by th...@orange.com on 2016/05/02 14:31:34 UTC, 0 replies.
- RE: Reading from Amazon S3 - posted by Jinan Alhajjaj <j....@hotmail.com> on 2016/05/02 16:37:02 UTC, 5 replies.
- Spark standalone workers, executors and JVMs - posted by captainfranz <ca...@gmail.com> on 2016/05/02 18:21:36 UTC, 5 replies.
- zero-length input partitions from parquet - posted by Han JU <ju...@gmail.com> on 2016/05/02 18:22:31 UTC, 0 replies.
- Weird results with Spark SQL Outer joins - posted by kpeng1 <kp...@gmail.com> on 2016/05/02 18:58:23 UTC, 17 replies.
- java.lang.NullPointerException while performing rdd.SaveToCassandra - posted by meson10 <sp...@piyushverma.net> on 2016/05/02 19:32:39 UTC, 2 replies.
- Improving performance of a kafka spark streaming app - posted by Colin Kincaid Williams <di...@uw.edu> on 2016/05/02 19:54:41 UTC, 8 replies.
- how to orderBy previous groupBy.count.orderBy in pyspark - posted by webe3vt <we...@aim.com> on 2016/05/02 21:01:27 UTC, 1 replies.
- QueryExecution to String breaks with OOM - posted by Brandon White <bw...@gmail.com> on 2016/05/02 22:02:19 UTC, 0 replies.
- Re: Number of executors change during job running - posted by Vikash Pareek <vi...@infoobjects.com> on 2016/05/02 22:51:04 UTC, 0 replies.
- Redirect from yarn to spark history server - posted by satish saley <sa...@gmail.com> on 2016/05/02 23:14:52 UTC, 1 replies.
- Re: Spark Streaming UI duration numbers mismatch - posted by Jatin Kumar <jk...@rocketfuelinc.com.INVALID> on 2016/05/03 01:09:39 UTC, 0 replies.
- Spark build failure with com.oracle:ojdbc6:jar:11.2.0.1.0 - posted by Hien Luu <hi...@gmail.com> on 2016/05/03 02:51:56 UTC, 7 replies.
- Error from reading S3 in Scala - posted by "Zhang, Jingyu" <ji...@news.com.au> on 2016/05/03 02:53:32 UTC, 5 replies.
- spark 1.6.1 build failure of : scala-maven-plugin - posted by sunday2000 <23...@qq.com> on 2016/05/03 04:18:34 UTC, 5 replies.
- Consume WebService in Spark - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2016/05/03 04:45:53 UTC, 1 replies.
- 回复： spark 1.6.1 build failure of : scala-maven-plugin - posted by sunday2000 <23...@qq.com> on 2016/05/03 04:51:23 UTC, 4 replies.
- Performance benchmarking of Spark Vs other languages - posted by Abhijith Chandraprabhu <ab...@gmail.com> on 2016/05/03 07:02:51 UTC, 1 replies.
- Submit job to spark cluster Error ErrorMonitor dropping message... - posted by Tenghuan He <te...@gmail.com> on 2016/05/03 08:51:50 UTC, 0 replies.
- Spark streaming app starts processing when kill that app - posted by Shams ul Haque <sh...@cashcare.in> on 2016/05/03 09:05:01 UTC, 3 replies.
- Clear Threshold in Logistic Regression ML Pipeline - posted by Abhishek Anand <ab...@gmail.com> on 2016/05/03 09:19:33 UTC, 0 replies.
- Re: removing header from csv file - posted by Abhishek Anand <ab...@gmail.com> on 2016/05/03 09:23:31 UTC, 2 replies.
- parquet table in spark-sql - posted by 喜之郎 <25...@qq.com> on 2016/05/03 12:49:57 UTC, 2 replies.
- [Spark 1.5.2] Spark dataframes vs sql query -performance parameter ? - posted by Divya Gehlot <di...@gmail.com> on 2016/05/03 14:45:39 UTC, 0 replies.
- unsubscribe - posted by Rodrick Brown <ro...@orchard-app.com> on 2016/05/03 15:58:41 UTC, 7 replies.
- Multiple Spark Applications that use Cassandra, how to share resources/nodes - posted by Tobias Eriksson <to...@qvantel.com> on 2016/05/03 16:34:28 UTC, 4 replies.
- --jars for mesos cluster - posted by Alex Dzhagriev <dz...@gmail.com> on 2016/05/03 18:11:18 UTC, 0 replies.
- Re: yarn-cluster - posted by nsalian <ns...@cloudera.com> on 2016/05/03 19:53:25 UTC, 1 replies.
- Spark 1.5.2 Shuffle Blocks - running out of memory - posted by Nirav Patel <np...@xactlycorp.com> on 2016/05/03 20:18:41 UTC, 1 replies.
- Free memory while launching jobs. - posted by Renato Perini <re...@gmail.com> on 2016/05/03 20:56:06 UTC, 1 replies.
- Re: Creating new Spark context when running in Secure YARN fails - posted by nsalian <ns...@cloudera.com> on 2016/05/03 21:06:35 UTC, 0 replies.
- Re: a question about --executor-cores - posted by nsalian <ns...@cloudera.com> on 2016/05/03 21:17:17 UTC, 0 replies.
- Re: Error while running jar using spark-submit on another machine - posted by nsalian <ns...@cloudera.com> on 2016/05/03 21:27:24 UTC, 0 replies.
- Calculating log-loss for the trained model in Spark ML - posted by Abhishek Anand <ab...@gmail.com> on 2016/05/03 21:28:42 UTC, 0 replies.
- Alternative to groupByKey() + mapValues() for non-commutative, non-associative aggregate? - posted by Bibudh Lahiri <bi...@gmail.com> on 2016/05/03 23:29:47 UTC, 1 replies.
- migration from Teradata to Spark SQL - posted by Tapan Upadhyay <ta...@gmail.com> on 2016/05/04 03:29:35 UTC, 6 replies.
- 回复： parquet table in spark-sql - posted by 喜之郎 <25...@qq.com> on 2016/05/04 03:49:58 UTC, 0 replies.
- run-example streaming.KafkaWordCount fails on CDH 5.7.0 - posted by Michel Hubert <mi...@phact.nl> on 2016/05/04 08:29:57 UTC, 3 replies.
- substitute mapPartitions by distinct - posted by Batselem <se...@gmail.com> on 2016/05/04 08:56:56 UTC, 0 replies.
- Spark Select Statement - posted by Sree Eedupuganti <sr...@inndata.in> on 2016/05/04 09:39:19 UTC, 2 replies.
- restrict my spark app to run on specific machines - posted by Shams ul Haque <sh...@cashcare.in> on 2016/05/04 10:03:36 UTC, 1 replies.
- spark w/ scala 2.11 and PackratParsers - posted by matd <ma...@gmail.com> on 2016/05/04 12:12:30 UTC, 0 replies.
- Re: Spark and Kafka direct approach problem - posted by أنس الليثي <de...@gmail.com> on 2016/05/04 13:17:03 UTC, 3 replies.
- Reading from cassandra store in rdd - posted by Yasemin Kaya <go...@gmail.com> on 2016/05/04 13:36:57 UTC, 0 replies.
- Spark MLLib benchmarks - posted by kmurph <k....@qub.ac.uk> on 2016/05/04 14:21:22 UTC, 0 replies.
- IS spark have CapacityScheduler? - posted by 开心延年 <mu...@qq.com> on 2016/05/04 14:44:26 UTC, 1 replies.
- groupBy and store in parquet - posted by Michal Vince <vi...@gmail.com> on 2016/05/04 14:47:32 UTC, 4 replies.
- Performance with Insert overwrite into Hive Table. - posted by Bijay Kumar Pathak <bk...@mtu.edu> on 2016/05/04 17:02:38 UTC, 2 replies.
- spark job stage failures - posted by Prajwal Tuladhar <pr...@infynyxx.com> on 2016/05/04 19:40:01 UTC, 0 replies.
- Re: PySpark Issue: "org.apache.spark.shuffle.FetchFailedException: Failed to connect to..." - posted by HLee <hw...@csu.fullerton.edu> on 2016/05/04 19:51:48 UTC, 0 replies.
- Writing output of key-value Pair RDD - posted by "Afshartous, Nick" <na...@turbine.com> on 2016/05/04 20:09:08 UTC, 3 replies.
- Stackoverflowerror in scala.collection - posted by BenD <be...@bigbluebubble.com> on 2016/05/04 21:04:52 UTC, 1 replies.
- DAG Pipelines? - posted by Cesar Flores <ce...@gmail.com> on 2016/05/04 21:25:11 UTC, 0 replies.
- SqlContext parquet read OutOfMemoryError: Requested array size exceeds VM limit error - posted by Bijay Kumar Pathak <bk...@mtu.edu> on 2016/05/04 21:44:15 UTC, 3 replies.
- Do I need to install Cassandra node on Spark Master node to work with Cassandra? - posted by Vinayak Agrawal <vi...@gmail.com> on 2016/05/05 00:36:09 UTC, 1 replies.
- DeepSpark: where to start - posted by Joice Joy <jo...@gmail.com> on 2016/05/05 02:42:56 UTC, 5 replies.
- ArrayIndexOutOfBoundsException in model selection via cross-validation sample with spark 1.6.1 - posted by Terry Hoo <hu...@gmail.com> on 2016/05/05 03:48:57 UTC, 0 replies.
- Locality aware tree reduction - posted by aymkhalil <ay...@gmail.com> on 2016/05/05 05:54:50 UTC, 1 replies.
- Mllib using model to predict probability - posted by colin <co...@sina.cn> on 2016/05/05 05:59:43 UTC, 1 replies.
- Access S3 bucket using IAM roles - posted by Jyotiska <jy...@gmail.com> on 2016/05/05 10:40:42 UTC, 0 replies.
- package for data quality in Spark 1.5.2 - posted by Divya Gehlot <di...@gmail.com> on 2016/05/05 10:51:03 UTC, 3 replies.
- H2O + Spark Streaming? - posted by diplomatic Guru <di...@gmail.com> on 2016/05/05 15:26:22 UTC, 1 replies.
- Could we use Sparkling Water Lib with Spark Streaming - posted by diplomatic Guru <di...@gmail.com> on 2016/05/05 15:34:03 UTC, 0 replies.
- Individual DStream Checkpointing in Spark Streaming - posted by Akash Mishra <ak...@gmail.com> on 2016/05/05 15:41:01 UTC, 0 replies.
- Missing data in Kafka Consumer - posted by Jerry <je...@gmail.com> on 2016/05/05 16:18:51 UTC, 4 replies.
- Content-based Recommendation Engine - posted by Sree Eedupuganti <sr...@inndata.in> on 2016/05/05 16:26:17 UTC, 2 replies.
- mesos cluster mode - posted by satish saley <sa...@gmail.com> on 2016/05/05 17:32:59 UTC, 0 replies.
- Accessing JSON array in Spark SQL - posted by Xinh Huynh <xi...@gmail.com> on 2016/05/05 18:53:22 UTC, 1 replies.
- Re: Spark Streaming, Batch interval, Windows length and Sliding Interval settings - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/05/05 18:56:23 UTC, 0 replies.
- SortWithinPartitions on DataFrame - posted by Darshan Singh <da...@gmail.com> on 2016/05/05 20:23:17 UTC, 0 replies.
- How long should logistic regression take on this data? - posted by Bibudh Lahiri <bi...@gmail.com> on 2016/05/05 22:37:06 UTC, 0 replies.
- Disable parquet metadata summary in - posted by Bijay Kumar Pathak <bk...@mtu.edu> on 2016/05/06 00:42:58 UTC, 0 replies.
- [Spark 1.5.2 ]-how to set and get Storage level for Dataframe - posted by Divya Gehlot <di...@gmail.com> on 2016/05/06 03:06:45 UTC, 2 replies.
- Fw: Significant performance difference for same spark job in scala vs pyspark - posted by pratik gawande <pr...@hotmail.com> on 2016/05/06 04:47:18 UTC, 3 replies.
- Spark structured streaming is Micro batch? - posted by madhu phatak <ph...@gmail.com> on 2016/05/06 08:07:21 UTC, 3 replies.
- Error Kafka/Spark. Ran out of messages before reaching ending offset - posted by Guillermo Ortiz <ko...@gmail.com> on 2016/05/06 09:05:00 UTC, 4 replies.
- Found Data Quality check package for Spark - posted by Divya Gehlot <di...@gmail.com> on 2016/05/06 10:33:39 UTC, 1 replies.
- Spark UI only shows lines belonging to py4j lib - posted by cmbendre <ch...@gmail.com> on 2016/05/06 12:19:05 UTC, 0 replies.
- TaskEnd Metrics - posted by Manivannan Selvadurai <sm...@gmail.com> on 2016/05/06 12:21:39 UTC, 0 replies.
- Updating Values Inside Foreach Rdd loop - posted by HARSH TAKKAR <ta...@gmail.com> on 2016/05/06 12:25:35 UTC, 6 replies.
- Reading text file from Amazon S3 - posted by Jinan Alhajjaj <j....@hotmail.com> on 2016/05/06 12:31:24 UTC, 0 replies.
- getting NullPointerException while doing left outer join - posted by Adam Westerman <as...@gmail.com> on 2016/05/06 13:57:51 UTC, 3 replies.
- Spark Web UI issue - posted by Pietro Gentile <pi...@gmail.com> on 2016/05/06 14:08:47 UTC, 0 replies.
- Reading Shuffle Data from highly loaded nodes - posted by Alvaro Brandon <al...@gmail.com> on 2016/05/06 14:38:04 UTC, 1 replies.
- createDirectStream with offsets - posted by Eric Friedman <er...@gmail.com> on 2016/05/06 14:47:43 UTC, 3 replies.
- Sliding Average over Window in Spark Streaming - posted by Matthias Niehoff <ma...@codecentric.de> on 2016/05/06 14:54:55 UTC, 2 replies.
- Adhoc queries on Spark 2.0 with Structured Streaming - posted by Sunita Arvind <su...@gmail.com> on 2016/05/06 16:21:24 UTC, 7 replies.
- CreateProcess error=5, Access is denied when trying SparkLauncher example in Win10 - posted by Augusto Uehara <gu...@gmail.com> on 2016/05/06 17:08:12 UTC, 0 replies.
- killing spark job which is submitted using SparkSubmit - posted by satish saley <sa...@gmail.com> on 2016/05/06 18:50:39 UTC, 4 replies.
- Working out min() and max() values in Spark streaming sliding interval - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/05/06 20:31:21 UTC, 0 replies.
- Spark Streaming : cpu cores max utilization - posted by chandan prakash <ch...@gmail.com> on 2016/05/07 03:08:06 UTC, 3 replies.
- Correct way of setting executor numbers and executor cores in Spark 1.6.1 for non-clustered mode ? - posted by kmurph <k....@qub.ac.uk> on 2016/05/07 11:03:40 UTC, 3 replies.
- Finding max value in spark streaming sliding window - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/05/07 15:09:43 UTC, 0 replies.
- How to verify if spark is using kryo serializer for shuffle - posted by Nirav Patel <np...@xactlycorp.com> on 2016/05/07 18:13:42 UTC, 8 replies.
- sqlCtx.read.parquet yields lots of small tasks - posted by "Johnny W." <jz...@gmail.com> on 2016/05/07 21:50:29 UTC, 4 replies.
- Rename hive orc table caused no content in spark - posted by yansqrt3 <ya...@gmail.com> on 2016/05/08 04:08:08 UTC, 0 replies.
- Joining a RDD to a Dataframe - posted by Cyril Scetbon <cy...@free.fr> on 2016/05/08 06:41:16 UTC, 4 replies.
- pyspark dataframe sort issue - posted by Buntu Dev <bu...@gmail.com> on 2016/05/08 06:48:58 UTC, 2 replies.
- Is it a bug? - posted by Sisyphuss <zh...@gmail.com> on 2016/05/08 09:14:25 UTC, 5 replies.
- different SqlContext with same udf name with different meaning - posted by Igor Berman <ig...@gmail.com> on 2016/05/08 16:49:52 UTC, 0 replies.
- BlockManager crashing applications - posted by Brandon White <bw...@gmail.com> on 2016/05/08 20:01:30 UTC, 3 replies.
- Parse Json in Spark - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2016/05/08 22:20:28 UTC, 4 replies.
- partitioner aware subtract - posted by Raghava Mutharaju <m....@gmail.com> on 2016/05/09 03:18:00 UTC, 4 replies.
- How big the spark stream window could be ? - posted by "kramer2009@126.com" <kr...@126.com> on 2016/05/09 03:19:28 UTC, 27 replies.
- Re: Spark support for Complex Event Processing (CEP) - posted by Esa Heikkinen <es...@student.tut.fi> on 2016/05/09 09:36:10 UTC, 0 replies.
- Kafka 0.9 and spark-streaming-kafka_2.10 - posted by Michel Hubert <mi...@phact.nl> on 2016/05/09 09:55:21 UTC, 1 replies.
- java.lang.NoClassDefFoundError: kafka/api/TopicMetadataRequest - posted by Guillermo Ortiz <ko...@gmail.com> on 2016/05/09 10:51:45 UTC, 5 replies.
- apache spark on gitter? - posted by Paweł Szulc <pa...@gmail.com> on 2016/05/09 11:45:33 UTC, 7 replies.
- ERROR SparkContext: Error initializing SparkContext. - posted by Andrew Holway <an...@otternetworks.de> on 2016/05/09 13:45:46 UTC, 1 replies.
- No of Spark context per jvm - posted by praveen S <my...@gmail.com> on 2016/05/09 14:16:20 UTC, 1 replies.
- Streaming application slows over time - posted by Bryan Jeffrey <br...@gmail.com> on 2016/05/09 14:32:10 UTC, 1 replies.
- Help understanding an exception that produces multiple stack traces - posted by James Casiraghi <jc...@algebraixdata.com> on 2016/05/09 14:39:49 UTC, 0 replies.
- Why I have memory leaking for such simple spark stream code? - posted by "kramer2009@126.com" <kr...@126.com> on 2016/05/09 15:03:23 UTC, 1 replies.
- what is "spark.history.retainedApplications" points to - posted by neeraj_yadav <ne...@outlook.com> on 2016/05/09 15:12:45 UTC, 0 replies.
- Re: How to use pyspark streaming module "slice"? - posted by sethirot <as...@combotag.com> on 2016/05/09 16:32:07 UTC, 1 replies.
- StreamingLinearRegression Java example - posted by diplomatic Guru <di...@gmail.com> on 2016/05/09 17:12:21 UTC, 0 replies.
- DataFrame cannot find temporary table - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2016/05/09 17:33:34 UTC, 2 replies.
- Spark-csv- partitionBy - posted by "Mail.com" <pr...@mail.com> on 2016/05/09 18:12:54 UTC, 3 replies.
- spark 2.0 issue with yarn? - posted by Jesse F Chen <jf...@us.ibm.com> on 2016/05/09 20:24:46 UTC, 7 replies.
- Accumulator question - posted by Abi <an...@gmail.com> on 2016/05/10 00:24:06 UTC, 3 replies.
- best fit - Dataframe and spark sql use cases - posted by Divya Gehlot <di...@gmail.com> on 2016/05/10 02:36:44 UTC, 1 replies.
- Accessing Cassandra data from Spark Shell - posted by Cassa L <lc...@gmail.com> on 2016/05/10 04:08:40 UTC, 5 replies.
- spark 1.6 : RDD Partitions not distributed evenly to executors - posted by prateek arora <pr...@gmail.com> on 2016/05/10 04:58:43 UTC, 0 replies.
- Pyspark with non default hive table - posted by ayan guha <gu...@gmail.com> on 2016/05/10 06:22:22 UTC, 0 replies.
- spark uploading resource error - posted by 朱旻 <zz...@126.com> on 2016/05/10 07:51:41 UTC, 4 replies.
- Reading table schema from Cassandra - posted by justneeraj <ju...@gmail.com> on 2016/05/10 09:21:36 UTC, 1 replies.
- Spark Streaming : is spark.streaming.receiver.maxRate valid for DirectKafkaApproach - posted by chandan prakash <ch...@gmail.com> on 2016/05/10 12:02:24 UTC, 5 replies.
- Init/Setup worker - posted by Lionel PERRIN <li...@hotmail.com> on 2016/05/10 14:37:09 UTC, 1 replies.
- Evenly balance the number of items in each RDD partition - posted by Ayman Khalil <ay...@gmail.com> on 2016/05/10 17:38:51 UTC, 6 replies.
- Cluster Migration - posted by Ajay Chander <it...@gmail.com> on 2016/05/10 17:57:42 UTC, 6 replies.
- pyspark mappartions () - posted by Abi <an...@gmail.com> on 2016/05/10 18:20:25 UTC, 4 replies.
- Pyspark accumulator - posted by Abi <an...@gmail.com> on 2016/05/10 18:24:41 UTC, 2 replies.
- Hi test - posted by Abi <an...@gmail.com> on 2016/05/10 18:30:22 UTC, 0 replies.
- Spark crashes with Filesystem recovery - posted by Imran Akbar <sk...@gmail.com> on 2016/05/10 19:52:43 UTC, 1 replies.
- Not able pass 3rd party jars to mesos executors - posted by gpatcham <gp...@gmail.com> on 2016/05/10 20:43:36 UTC, 6 replies.
- Re: Save DataFrame to HBase - posted by Benjamin Kim <bb...@gmail.com> on 2016/05/10 21:53:44 UTC, 2 replies.
- Reliability of JMS Custom Receiver in Spark Streaming JMS - posted by Sourav Mazumder <so...@gmail.com> on 2016/05/10 22:17:20 UTC, 1 replies.
- Spark 1.6 Catalyst optimizer - posted by Telmo Rodrigues <te...@gmail.com> on 2016/05/11 00:57:10 UTC, 7 replies.
- Unable to write stream record to cassandra table with multiple columns - posted by Anand N Ilkal <an...@gmail.com> on 2016/05/11 02:01:18 UTC, 0 replies.
- What does the spark stand alone cluster do? - posted by "kramer2009@126.com" <kr...@126.com> on 2016/05/11 02:24:18 UTC, 0 replies.
- Will the HiveContext cause memory leak ? - posted by "kramer2009@126.com" <kr...@126.com> on 2016/05/11 03:25:07 UTC, 8 replies.
- How to resolve Scheduling delay in Spark streaming applications? - posted by Hemalatha A <he...@googlemail.com> on 2016/05/11 04:31:12 UTC, 0 replies.
- Spark hanging forever when doing decision tree training - posted by Loic Quertenmont <lo...@gmail.com> on 2016/05/11 06:20:41 UTC, 0 replies.
- not able to write to cassandra table from spark - posted by anandnilkal <an...@gmail.com> on 2016/05/11 06:28:57 UTC, 0 replies.
- [Spark 1.5.2]Check Foreign Key constraint - posted by Divya Gehlot <di...@gmail.com> on 2016/05/11 07:57:55 UTC, 2 replies.
- Use Collaborative Filtering and Clustering Algorithm in Spark MLIB - posted by Imre Nagi <im...@gmail.com> on 2016/05/11 08:29:57 UTC, 0 replies.
- java.lang.ClassCastException: org.apache.spark.util.SerializableConfiguration cannot be cast to [B - posted by Daniel Haviv <da...@veracity-group.com> on 2016/05/11 10:40:16 UTC, 1 replies.
- Error: "Answer from Java side is empty" - posted by AlexModestov <Al...@gmail.com> on 2016/05/11 11:56:55 UTC, 0 replies.
- dataframe udf functioin will be executed twice when filter on new column created by withColumn - posted by Tony Jin <li...@gmail.com> on 2016/05/11 13:55:08 UTC, 2 replies.
- Spark 1.6.0: substring on df.select - posted by Bharathi Raja <ra...@yahoo.com.INVALID> on 2016/05/11 15:07:07 UTC, 0 replies.
- Re: Spark 1.6.0: substring on df.select - posted by Raghavendra Pandey <ra...@gmail.com> on 2016/05/11 15:34:11 UTC, 3 replies.
- Setting Spark Worker Memory - posted by شجاع الرحمن بیگ <sh...@gmail.com> on 2016/05/11 15:38:07 UTC, 5 replies.
- Spark on DSE Cassandra with multiple data centers - posted by Simone Franzini <ca...@gmail.com> on 2016/05/11 16:15:41 UTC, 0 replies.
- Datasets is extremely slow in comparison to RDD in standalone mode WordCount examlpe - posted by Amit Sela <am...@gmail.com> on 2016/05/11 17:59:40 UTC, 4 replies.
- Is this possible to do in spark ? - posted by Pradeep Nayak <pr...@gmail.com> on 2016/05/11 18:36:30 UTC, 1 replies.
- How to take executor memory dump - posted by Nirav Patel <np...@xactlycorp.com> on 2016/05/11 18:38:51 UTC, 0 replies.
- kryo - posted by Younes Naguib <Yo...@tritondigital.com> on 2016/05/11 21:18:52 UTC, 3 replies.
- How to transform a JSON string into a Java HashMap<> java.io.NotSerializableException - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/05/11 22:55:30 UTC, 0 replies.
- Re: How to transform a JSON string into a Java HashMap<> java.io.NotSerializableException - posted by Marcelo Vanzin <va...@cloudera.com> on 2016/05/11 22:59:28 UTC, 0 replies.
- When start spark-sql, postgresql gives errors. - posted by Joseph <wx...@sina.com> on 2016/05/12 03:53:22 UTC, 1 replies.
- Will this affect the result of spark? - posted by sunday2000 <23...@qq.com> on 2016/05/12 04:04:12 UTC, 0 replies.
- Graceful shutdown of spark streaming on yarn - posted by "Rakesh H (Marketing Platform-BLR)" <ra...@flipkart.com> on 2016/05/12 06:12:56 UTC, 9 replies.
- parallelism of task executor worker threads during s3 reads - posted by sanusha <an...@gmail.com> on 2016/05/12 07:30:56 UTC, 0 replies.
- Submitting Job to YARN-Cluster using Spark Job Server - posted by ashesh_28 <as...@gmail.com> on 2016/05/12 07:48:11 UTC, 0 replies.
- Re: Need for advice - performance improvement and out of memory resolution - posted by AlexModestov <Al...@gmail.com> on 2016/05/12 08:47:54 UTC, 0 replies.
- Re: ML regression - spark context dies without error - posted by AlexModestov <Al...@gmail.com> on 2016/05/12 09:03:26 UTC, 0 replies.
- Why spark give out this error message? - posted by sunday2000 <23...@qq.com> on 2016/05/12 09:07:54 UTC, 0 replies.
- My notes on Spark Performance & Tuning Guide - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/05/12 10:08:14 UTC, 12 replies.
- SQLContext and HiveContext parse a query string differently ? - posted by Hao Ren <in...@gmail.com> on 2016/05/12 11:09:34 UTC, 3 replies.
- Spark SQL: Managed memory leak detected - posted by bi...@gmail.com on 2016/05/12 11:27:27 UTC, 0 replies.
- Efficient for loops in Spark - posted by flyinggip <my...@hotmail.com> on 2016/05/12 13:34:13 UTC, 1 replies.
- LinearRegressionWithSGD fails on 12Mb data - posted by RainDev <9r...@gmail.com> on 2016/05/12 16:05:41 UTC, 0 replies.
- mllib random forest - executor heartbeat timed out - posted by vtkmh <ke...@hotmail.com> on 2016/05/12 17:08:17 UTC, 0 replies.
- S3A Creating Task Per Byte (pyspark / 1.6.1) - posted by Aaron Jackson <aj...@pobox.com> on 2016/05/12 17:35:31 UTC, 1 replies.
- XML Processing using Spark SQL - posted by Arunkumar Chandrasekar <ch...@gmail.com> on 2016/05/12 20:03:04 UTC, 2 replies.
- How to get and save core dump of native library in executors - posted by prateek arora <pr...@gmail.com> on 2016/05/12 21:23:27 UTC, 2 replies.
- Re: How to get and save core dump of native library in executors - posted by Ted Yu <yu...@gmail.com> on 2016/05/12 21:40:00 UTC, 1 replies.
- Spark handling spill overs - posted by Ashok Kumar <as...@yahoo.com.INVALID> on 2016/05/12 23:07:01 UTC, 2 replies.
- Confused - returning RDDs from functions - posted by Do...@ODDO, od...@gmail.com on 2016/05/13 03:06:39 UTC, 1 replies.
- Why spark 1.6.1 run so slow? - posted by sunday2000 <23...@qq.com> on 2016/05/13 03:30:14 UTC, 0 replies.
- sbt for Spark build with Scala 2.11 - posted by Raghava Mutharaju <m....@gmail.com> on 2016/05/13 04:01:03 UTC, 5 replies.
- Re: High virtual memory consumption on spark-submit client. - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/05/13 05:01:29 UTC, 3 replies.
- Java: Return type of RDDFunctions.sliding(int, int) - posted by tgodden <tg...@vub.ac.be> on 2016/05/13 07:40:53 UTC, 6 replies.
- The metastore database gives errors when start spark-sql CLI. - posted by Joseph <wx...@sina.com> on 2016/05/13 08:09:07 UTC, 1 replies.
- Re: Kafka partition increased while Spark Streaming is running - posted by chandan prakash <ch...@gmail.com> on 2016/05/13 09:24:06 UTC, 2 replies.
- ANOVA test in Spark - posted by mayankshete <ma...@yash.com> on 2016/05/13 10:24:51 UTC, 2 replies.
- Creating Nested dataframe from flat data. - posted by Prashant Bhardwaj <pr...@gmail.com> on 2016/05/13 11:51:05 UTC, 2 replies.
- SparkSql Catalyst extending Analyzer, Error with CatalystConf - posted by Alexander Sibetheros <al...@gmail.com> on 2016/05/13 12:11:52 UTC, 1 replies.
- memory leak exception - posted by Imran Akbar <sk...@gmail.com> on 2016/05/13 14:43:55 UTC, 0 replies.
- Spark 2.0.0-snapshot: IllegalArgumentException: requirement failed: chunks must be non-empty - posted by Raghava Mutharaju <m....@gmail.com> on 2016/05/13 15:13:27 UTC, 1 replies.
- pandas dataframe broadcasted. giving errors in datanode function called kernel - posted by abi <an...@gmail.com> on 2016/05/13 16:59:06 UTC, 1 replies.
- Tracking / estimating job progress - posted by Do...@ODDO, od...@gmail.com on 2016/05/13 17:05:52 UTC, 5 replies.
- strange behavior when I chain data frame transformations - posted by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/05/13 18:49:15 UTC, 2 replies.
- API to study key cardinality and distribution and other important statistics about data at certain stage - posted by Nirav Patel <np...@xactlycorp.com> on 2016/05/13 19:04:24 UTC, 0 replies.
- Executor memory requirement for reduceByKey - posted by Sung Hwan Chung <co...@cs.stanford.edu> on 2016/05/13 19:14:24 UTC, 4 replies.
- System memory 186646528 must be at least 4.718592E8. - posted by satish saley <sa...@gmail.com> on 2016/05/13 19:47:06 UTC, 2 replies.
- Spark job fails when using checkpointing if a class change in the job - posted by map reduced <k3...@gmail.com> on 2016/05/13 19:50:46 UTC, 0 replies.
- broadcast variable not picked up - posted by abi <an...@gmail.com> on 2016/05/13 22:53:50 UTC, 1 replies.
- support for golang - posted by Sourav Chakraborty <so...@gmail.com> on 2016/05/14 02:21:39 UTC, 1 replies.
- Spark 1.4.1 + Kafka 0.8.2 with Kerberos - posted by "Mail.com" <pr...@mail.com> on 2016/05/14 03:48:01 UTC, 0 replies.
- How to limit search range without using subquery when query SQL DB via JDBC? - posted by Jyun-Fan Tsai <jy...@gmail.com> on 2016/05/14 03:56:39 UTC, 1 replies.
- Issue with Spark Streaming UI - posted by Sachin Janani <sj...@snappydata.io> on 2016/05/14 06:26:17 UTC, 2 replies.
- spark sql write orc table on viewFS throws exception - posted by linxi zeng <li...@gmail.com> on 2016/05/15 03:01:50 UTC, 1 replies.
- "collecting" DStream data - posted by Daniel Haviv <da...@veracity-group.com> on 2016/05/15 10:23:51 UTC, 2 replies.
- Structured Streaming in Spark 2.0 and DStreams - posted by "Yuval.Itzchakov" <yu...@gmail.com> on 2016/05/15 10:52:24 UTC, 11 replies.
- Executors and Cores - posted by "Mail.com" <pr...@mail.com> on 2016/05/15 12:19:46 UTC, 5 replies.
- orgin of error - posted by pseudo oduesp <ps...@gmail.com> on 2016/05/15 15:47:06 UTC, 2 replies.
- spark udf can not change a json string to a map - posted by 喜之郎 <25...@qq.com> on 2016/05/15 16:18:38 UTC, 2 replies.
- How to use the spark submit script / capability - posted by Stephen Boesch <ja...@gmail.com> on 2016/05/15 16:33:01 UTC, 4 replies.
- pyspark.zip and py4j-0.9-src.zip - posted by satish saley <sa...@gmail.com> on 2016/05/15 18:55:58 UTC, 1 replies.
- JDBC SQL Server RDD - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2016/05/15 19:05:03 UTC, 2 replies.
- Errors when running SparkPi on a clean Spark 1.6.1 on Mesos - posted by Richard Siebeling <rs...@gmail.com> on 2016/05/15 21:50:31 UTC, 4 replies.
- Kafka stream message sampling - posted by Samuel Zhou <zh...@gmail.com> on 2016/05/15 22:24:46 UTC, 2 replies.
- 回复： spark udf can not change a json string to a map - posted by 喜之郎 <25...@qq.com> on 2016/05/16 02:00:50 UTC, 1 replies.
- Debug spark core and streaming programs in scala - posted by Deepak Sharma <de...@gmail.com> on 2016/05/16 05:25:26 UTC, 2 replies.
- Issue with creation of EC2 cluster using spark scripts - posted by Marco Mistroni <mm...@gmail.com> on 2016/05/16 08:37:05 UTC, 0 replies.
- Renaming nested columns in dataframe - posted by Prashant Bhardwaj <pr...@gmail.com> on 2016/05/16 11:58:05 UTC, 0 replies.
- What / Where / When / How questions in Spark 2.0 ? - posted by Ovidiu-Cristian MARCU <ov...@inria.fr> on 2016/05/16 12:18:20 UTC, 5 replies.
- GC overhead limit exceeded - posted by AlexModestov <Al...@gmail.com> on 2016/05/16 13:00:15 UTC, 3 replies.
- Apache Spark Slack - posted by Paweł Szulc <pa...@gmail.com> on 2016/05/16 13:40:55 UTC, 6 replies.
- Monitoring Spark application progress - posted by Ashok Kumar <as...@yahoo.com.INVALID> on 2016/05/16 16:13:07 UTC, 2 replies.
- how to add one more column in DataFrame - posted by Zhiliang Zhu <zc...@yahoo.com.INVALID> on 2016/05/16 17:16:28 UTC, 0 replies.
- KafkaUtils.createDirectStream Not Fetching Messages with Confluent Serializers as Value Decoder. - posted by "Ramaswamy, Muthuraman" <Mu...@viasat.com> on 2016/05/16 17:33:58 UTC, 9 replies.
- Silly Question on my part... - posted by Michael Segel <ms...@hotmail.com> on 2016/05/16 19:12:37 UTC, 0 replies.
- How to get the batch information from Streaming UI - posted by Samuel Zhou <zh...@gmail.com> on 2016/05/16 21:18:26 UTC, 1 replies.
- Re: Silly Question on my part... - posted by John Trengrove <jo...@servian.com.au> on 2016/05/17 00:21:40 UTC, 3 replies.
- Why spark 1.6.1 master can not monitor and start a auto stop worker? - posted by sunday2000 <23...@qq.com> on 2016/05/17 01:54:07 UTC, 2 replies.
- 回复： Why spark 1.6.1 master can not monitor and start a auto stop worker? - posted by sunday2000 <23...@qq.com> on 2016/05/17 02:19:47 UTC, 1 replies.
- what is the wrong while adding one column in the dataframe - posted by Zhiliang Zhu <zc...@yahoo.com.INVALID> on 2016/05/17 02:44:41 UTC, 0 replies.
- Will spark swap memory out to disk if the memory is not enough? - posted by "kramer2009@126.com" <kr...@126.com> on 2016/05/17 03:09:10 UTC, 1 replies.
- question about Union in pyspark and preserving partitioners - posted by Cameron Davidson-Pilon <ca...@shopify.com> on 2016/05/17 05:50:14 UTC, 0 replies.
- Code Example of Structured Streaming of 2.0 - posted by Todd <bi...@163.com> on 2016/05/17 06:46:28 UTC, 2 replies.
- Adding a new column to a temporary table - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/05/17 07:22:55 UTC, 0 replies.
- Does Structured Streaming support count(distinct) over all the streaming data? - posted by Todd <bi...@163.com> on 2016/05/17 08:36:18 UTC, 6 replies.
- SparkR query - posted by Mike Lewis <ML...@nephilaadvisors.co.uk> on 2016/05/17 10:07:12 UTC, 3 replies.
- Why does spark 1.6.0 can't use jar files stored on HDFS - posted by Serega Sheypak <se...@gmail.com> on 2016/05/17 12:33:21 UTC, 6 replies.
- spark job is not running on yarn clustor mode - posted by sp...@yahoo.com.INVALID on 2016/05/17 12:38:53 UTC, 3 replies.
- yarn-cluster mode error - posted by sp...@yahoo.com.INVALID on 2016/05/17 13:30:55 UTC, 1 replies.
- dataframe stat corr for multiple columns - posted by Ankur Jain <an...@yash.com> on 2016/05/17 14:09:25 UTC, 2 replies.
- What's the best way to find the Nearest Neighbor row of a matrix with 10billion rows x 300 columns? - posted by Rex X <dn...@gmail.com> on 2016/05/17 16:24:11 UTC, 1 replies.
- Error joining dataframes - posted by ram kumar <ra...@gmail.com> on 2016/05/17 16:39:57 UTC, 12 replies.
- Pls Assist: error when creating cluster on AWS using spark's ec2 scripts - posted by Marco Mistroni <mm...@gmail.com> on 2016/05/17 17:11:20 UTC, 0 replies.
- duplicate jar problem in yarn-cluster mode - posted by satish saley <sa...@gmail.com> on 2016/05/17 18:46:14 UTC, 1 replies.
- Inferring schema from GenericRowWithSchema - posted by Andy Grove <an...@agildata.com> on 2016/05/17 18:48:45 UTC, 2 replies.
- How to run hive queries in async mode using spark sql - posted by Raju Bairishetti <ra...@gmail.com> on 2016/05/18 01:03:23 UTC, 4 replies.
- Why spark 1.6.1 wokers auto stopped and can not register with master:Worker registration failed: Duplicate worker ID? - posted by sunday2000 <23...@qq.com> on 2016/05/18 01:40:26 UTC, 0 replies.
- How to use Kafka as data source for Structured Streaming - posted by Todd <bi...@163.com> on 2016/05/18 02:14:58 UTC, 0 replies.
- 答复: My notes on Spark Performance & Tuning Guide - posted by 谭成灶 <ta...@live.cn> on 2016/05/18 03:04:14 UTC, 1 replies.
- Re: How to use Kafka as data source for Structured Streaming - posted by Saisai Shao <sa...@gmail.com> on 2016/05/18 03:24:33 UTC, 0 replies.
- How to change output mode to Update - posted by Todd <bi...@163.com> on 2016/05/18 03:55:01 UTC, 6 replies.
- Load Table as DataFrame - posted by Mohanraj Ragupathiraj <mo...@gmail.com> on 2016/05/18 04:04:48 UTC, 2 replies.
- SPARK - DataFrame for BulkLoad - posted by Mohanraj Ragupathiraj <mo...@gmail.com> on 2016/05/18 04:14:34 UTC, 3 replies.
- Can Pyspark access Scala API? - posted by Abi <an...@gmail.com> on 2016/05/18 04:16:44 UTC, 3 replies.
- HBase / Spark Kerberos problem - posted by ph...@thomsonreuters.com on 2016/05/18 08:13:34 UTC, 5 replies.
- File not found exception while reading from folder using textFileStream - posted by Yogesh Vyas <in...@gmail.com> on 2016/05/18 09:06:09 UTC, 2 replies.
- [Spark 2.0 state store] Streaming wordcount using spark state store - posted by Shekhar Bansal <sh...@yahoo.com.INVALID> on 2016/05/18 10:50:59 UTC, 1 replies.
- Managed memory leak detected.SPARK-11293 ? - posted by Serega Sheypak <se...@gmail.com> on 2016/05/18 11:17:00 UTC, 4 replies.
- Spark Task not serializable with lag Window function - posted by luca_guerra <lg...@bitbang.com> on 2016/05/18 13:42:18 UTC, 0 replies.
- Submit python egg? - posted by Darren Govoni <da...@ontrenet.com> on 2016/05/18 15:13:44 UTC, 0 replies.
- Re: 2 tables join happens at Hive but not in spark - posted by Davies Liu <da...@databricks.com> on 2016/05/18 17:43:09 UTC, 0 replies.
- SLF4J binding error while running Spark using YARN as Cluster Manager - posted by Anubhav Agarwal <an...@gmail.com> on 2016/05/18 17:59:49 UTC, 1 replies.
- Re: Unit testing framework for Spark Jobs? - posted by swetha kasireddy <sw...@gmail.com> on 2016/05/18 18:14:08 UTC, 2 replies.
- Couldn't find leader offsets - posted by samsayiam <ha...@gmail.com> on 2016/05/18 21:04:23 UTC, 2 replies.
- Is there a way to run a jar built for scala 2.11 on spark 1.6.1 (which is using 2.10?) - posted by Sergey Zelvenskiy <se...@actions.im> on 2016/05/18 22:11:16 UTC, 2 replies.
- Spark UI metrics - Task execution time and number of records processed - posted by Nirav Patel <np...@xactlycorp.com> on 2016/05/19 01:40:35 UTC, 0 replies.
- Does Structured Streaming support Kafka as data source? - posted by Todd <bi...@163.com> on 2016/05/19 02:55:06 UTC, 1 replies.
- How to perform reduce operation in the same order as partition indexes - posted by Pulasthi Supun Wickramasinghe <pu...@gmail.com> on 2016/05/19 03:22:09 UTC, 2 replies.
- Latency experiment without losing executors - posted by gkumar7 <gk...@hawk.iit.edu> on 2016/05/19 04:41:13 UTC, 3 replies.
- Any way to pass custom hadoop conf to through spark thrift server ? - posted by Jeff Zhang <zj...@gmail.com> on 2016/05/19 06:40:11 UTC, 0 replies.
- Tar File: On Spark - posted by ayan guha <gu...@gmail.com> on 2016/05/19 06:42:20 UTC, 4 replies.
- Filter out the elements from xml file in Spark - posted by Yogesh Vyas <in...@gmail.com> on 2016/05/19 08:39:30 UTC, 1 replies.
- Spark Streaming Application run on yarn-clustor mode - posted by sp...@yahoo.com.INVALID on 2016/05/19 14:24:03 UTC, 1 replies.
- Hive 2 database Entity-Relationship Diagram - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/05/19 17:02:07 UTC, 3 replies.
- Starting executor without a master - posted by Mathieu Longtin <ma...@closetwork.org> on 2016/05/19 17:45:48 UTC, 19 replies.
- Spark log collection via Logstash - posted by Ashish Kumar Singh <as...@gmail.com> on 2016/05/19 18:05:37 UTC, 0 replies.
- Splitting RDD by partition - posted by shlomi <sh...@gmail.com> on 2016/05/19 20:59:01 UTC, 2 replies.
- Does spark support Apache Arrow - posted by Todd <bi...@163.com> on 2016/05/20 02:16:17 UTC, 2 replies.
- Query about how to estimate cpu usage for spark - posted by Wang Jiaye <ru...@gmail.com> on 2016/05/20 03:35:47 UTC, 1 replies.
- Is there a way to merge parquet small files? - posted by 王晓龙/01111515 <ro...@cmbchina.com> on 2016/05/20 03:50:34 UTC, 4 replies.
- Spark CacheManager Thread-safety - posted by Pietro Gentile <pi...@gmail.com> on 2016/05/20 09:34:52 UTC, 1 replies.
- Dataset API and avro type - posted by Han JU <ju...@gmail.com> on 2016/05/20 09:37:02 UTC, 3 replies.
- Spark.default.parallelism can not set reduce number - posted by 喜之郎 <25...@qq.com> on 2016/05/20 11:17:27 UTC, 1 replies.
- Re: Spark.default.parallelism can not set reduce number - posted by Takeshi Yamamuro <li...@gmail.com> on 2016/05/20 11:20:06 UTC, 0 replies.
- rpc.RpcTimeoutException: Futures timed out after [120 seconds] - posted by Sahil Sareen <sa...@gmail.com> on 2016/05/20 13:32:13 UTC, 3 replies.
- StackOverflowError in Spark SQL - posted by Jeff Jones <jj...@adaptivebiotech.com> on 2016/05/20 15:15:07 UTC, 0 replies.
- Problems finding the original objects after HashingTF() - posted by Pasquinell Urbani <pa...@exalitica.com> on 2016/05/20 16:45:54 UTC, 0 replies.
- Wide Datasets (v1.6.1) - posted by Don Drake <do...@gmail.com> on 2016/05/20 17:23:14 UTC, 2 replies.
- Can not set spark dynamic resource allocation - posted by "Cui, Weifeng" <we...@a9.com> on 2016/05/20 18:48:29 UTC, 9 replies.
- Logstash to collect Spark logs - posted by Ashish Kumar Singh <as...@gmail.com> on 2016/05/20 21:04:34 UTC, 0 replies.
- Memory issues when trying to insert data in the form of ORC using Spark SQL - posted by SRK <sw...@gmail.com> on 2016/05/20 22:43:13 UTC, 1 replies.
- set spark 1.6 with Hive 0.14 ? - posted by "Kali.tummala@gmail.com" <Ka...@gmail.com> on 2016/05/20 23:57:00 UTC, 5 replies.
- What factors decide the number of executors when doing a Spark SQL insert in Mesos? - posted by SRK <sw...@gmail.com> on 2016/05/20 23:58:43 UTC, 0 replies.
- A bug with RDD Storage Info and links to sort rows? - posted by Jacek Laskowski <ja...@japila.pl> on 2016/05/21 02:04:14 UTC, 0 replies.
- Spark Streaming S3 Error - posted by Benjamin Kim <bb...@gmail.com> on 2016/05/21 03:31:42 UTC, 3 replies.
- How to avoid empty unavoidable group by keys in DataFrame? - posted by unk1102 <um...@gmail.com> on 2016/05/21 09:12:14 UTC, 0 replies.
- Re: spark on yarn - posted by Shushant Arora <sh...@gmail.com> on 2016/05/21 14:14:46 UTC, 2 replies.
- How to carry data streams over multiple batch intervals in Spark Streaming - posted by Marco Platania <ma...@yahoo.it.INVALID> on 2016/05/21 16:28:52 UTC, 1 replies.
- Spark 2.0 - SQL Subqueries. - posted by Kamalesh Nair <ka...@gmail.com> on 2016/05/21 17:49:24 UTC, 1 replies.
- Does DataFrame has something like set hive.groupby.skewindata=true; - posted by unk1102 <um...@gmail.com> on 2016/05/21 20:48:08 UTC, 1 replies.
- Hive 2.0 on Spark 1.6.1 Engine - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/05/21 22:11:38 UTC, 0 replies.
- How to set the degree of parallelism in Spark SQL? - posted by SRK <sw...@gmail.com> on 2016/05/22 03:31:44 UTC, 5 replies.
- Structured Streaming for tweets - posted by singinpirate <th...@gmail.com> on 2016/05/22 05:56:21 UTC, 0 replies.
- How to insert data for 100 partitions at a time using Spark SQL - posted by SRK <sw...@gmail.com> on 2016/05/22 07:34:36 UTC, 15 replies.
- Hive 2 Metastore Entity-Relationship Diagram, Base tables - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/05/22 07:36:12 UTC, 1 replies.
- How to map values read from text file to 2 different set of RDDs - posted by Deepak Sharma <de...@gmail.com> on 2016/05/22 09:39:04 UTC, 0 replies.
- Unsubscribe - posted by Shekhar Kumar <sh...@outlook.com> on 2016/05/22 10:25:06 UTC, 0 replies.
- How to change Spark DataFrame groupby("col1",..,"coln") into reduceByKey()? - posted by unk1102 <um...@gmail.com> on 2016/05/22 10:34:02 UTC, 0 replies.
- Handling Empty RDD - posted by Yogesh Vyas <in...@gmail.com> on 2016/05/22 12:17:39 UTC, 2 replies.
- Seeing issues Jobs failing using yarn for setting spark.master=yarn-client in Hive or in mapred for mapreduce.framework.name - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/05/22 14:53:44 UTC, 1 replies.
- Dataset kryo encoder fails on Collections$UnmodifiableCollection - posted by Amit Sela <am...@gmail.com> on 2016/05/22 21:50:31 UTC, 2 replies.
- Using Spark on Hive with Hive also using Spark as its execution engine - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/05/22 23:42:10 UTC, 18 replies.
- How spark depends on Guava - posted by Todd <bi...@163.com> on 2016/05/23 03:48:58 UTC, 5 replies.
- Spark for offline log processing/querying - posted by Mat Schaffer <ma...@schaffer.me> on 2016/05/23 05:28:34 UTC, 4 replies.
- Spark Streaming - Exception thrown while writing record: BlockAdditionEvent - posted by Ewan Leith <ew...@realitymine.com> on 2016/05/23 08:36:10 UTC, 0 replies.
- Re: How to integrate Spark with OpenCV? - posted by Jishnu Prathap <ji...@wipro.com> on 2016/05/23 08:53:36 UTC, 0 replies.
- Spark job is failing with kerberos error while creating hive context in yarn-cluster mode (through spark-submit) - posted by Chandraprakash Bhagtani <cp...@gmail.com> on 2016/05/23 11:41:13 UTC, 5 replies.
- spark streaming: issue with logging with separate log4j properties files for driver and executor - posted by chandan prakash <ch...@gmail.com> on 2016/05/23 11:48:18 UTC, 2 replies.
- odd python.PythonRunner Times values? - posted by Adrian Bridgett <ad...@opensignal.com> on 2016/05/23 11:49:19 UTC, 0 replies.
- how to config spark thrift jdbc server high available - posted by qmzhang <57...@qq.com> on 2016/05/23 12:10:26 UTC, 3 replies.
- why spark 1.6 use Netty instead of Akka? - posted by Chaoqiang <hc...@aliyun.com> on 2016/05/23 13:19:08 UTC, 5 replies.
- Error making REST call from streaming app - posted by "Afshartous, Nick" <na...@turbine.com> on 2016/05/23 14:36:59 UTC, 0 replies.
- sqlContext.read.format("libsvm") not working with spark 1.6+ - posted by dbspace <db...@yahoo.com> on 2016/05/23 15:08:51 UTC, 0 replies.
- TFIDF question - posted by Pasquinell Urbani <pa...@exalitica.com> on 2016/05/23 15:11:59 UTC, 0 replies.
- How to map values read from test file to 2 different RDDs - posted by Deepak Sharma <de...@gmail.com> on 2016/05/23 15:14:16 UTC, 0 replies.
- What is the minimum value allowed for StreamingContext's Seconds parameter? - posted by YaoPau <jo...@gmail.com> on 2016/05/23 15:25:21 UTC, 2 replies.
- Hive_context - posted by Ajay Chander <it...@gmail.com> on 2016/05/23 19:51:03 UTC, 3 replies.
- Timed aggregation in Spark - posted by Nikhil Goyal <no...@gmail.com> on 2016/05/23 20:28:03 UTC, 3 replies.
- Spark JOIN Not working - posted by Aakash Basu <ra...@gmail.com> on 2016/05/24 06:43:54 UTC, 1 replies.
- Spark Streaming with Redis - posted by Pariksheet Barapatre <pb...@gmail.com> on 2016/05/24 07:58:32 UTC, 2 replies.
- Not able to write output to local filsystem from Standalone mode. - posted by Stuti Awasthi <st...@hcl.com> on 2016/05/24 09:26:50 UTC, 7 replies.
- Using HiveContext.set in multipul threads - posted by Amir Gershman <am...@fb.com> on 2016/05/24 11:01:55 UTC, 1 replies.
- Possible bug involving Vectors with a single element - posted by flyinggip <my...@hotmail.com> on 2016/05/24 14:27:01 UTC, 1 replies.
- About an runtime error when trying to recover a tuple from a kafka topic using spark streaming and scala - posted by Alonso Isidoro Roman <al...@gmail.com> on 2016/05/24 16:15:08 UTC, 0 replies.
- Re: Spark Streaming with Kafka - posted by Rasika Pohankar <ra...@gmail.com> on 2016/05/24 17:58:01 UTC, 0 replies.
- Re: How to read *.jhist file in Spark using scala - posted by Miles <ga...@gmail.com> on 2016/05/24 18:36:56 UTC, 0 replies.
- Maintain kafka offset externally as Spark streaming processes records. - posted by "sagarcasual ." <sa...@gmail.com> on 2016/05/24 19:07:17 UTC, 1 replies.
- Spark-submit hangs indefinitely after job completion. - posted by Pradeep Nayak <pr...@gmail.com> on 2016/05/24 19:07:56 UTC, 5 replies.
- Error publishing to spark-packages - posted by Neville Li <ne...@gmail.com> on 2016/05/24 19:29:10 UTC, 0 replies.
- How does Spark set task indexes? - posted by Adrien Mogenet <ad...@contentsquare.com> on 2016/05/24 20:00:11 UTC, 2 replies.
- Error while saving plots - posted by njoshi <ni...@teamaol.com> on 2016/05/24 20:37:31 UTC, 1 replies.
- Dataset Set Operations - posted by Tim Gautier <ti...@gmail.com> on 2016/05/24 22:46:35 UTC, 1 replies.
- job build cost more and more time - posted by naliazheli <75...@qq.com> on 2016/05/25 01:43:10 UTC, 1 replies.
- Using Java in Spark shell - posted by Ashok Kumar <as...@yahoo.com.INVALID> on 2016/05/25 05:11:50 UTC, 2 replies.
- Cartesian join on RDDs taking too much time - posted by Priya Ch <le...@gmail.com> on 2016/05/25 07:05:12 UTC, 13 replies.
- python application cluster mode in standalone spark cluster - posted by Jan Sourek <ja...@performio.cz> on 2016/05/25 07:13:33 UTC, 0 replies.
- run multiple spark jobs yarn-client mode - posted by sp...@yahoo.com.INVALID on 2016/05/25 07:23:11 UTC, 5 replies.
- Spark Streaming - Kafka - java.nio.BufferUnderflowException - posted by Scott W <de...@gmail.com> on 2016/05/25 07:30:00 UTC, 1 replies.
- never understand - posted by pseudo oduesp <ps...@gmail.com> on 2016/05/25 08:32:01 UTC, 1 replies.
- Facing issues while reading parquet file in spark 1.2.1 - posted by vaibhav srivastava <va...@gmail.com> on 2016/05/25 11:27:15 UTC, 2 replies.
- StackOverflow in Spark - posted by Michel Hubert <mi...@phact.nl> on 2016/05/25 12:17:00 UTC, 0 replies.
- Spark Streaming - Kafka Direct Approach: re-compute from specific time - posted by trung kien <ki...@gmail.com> on 2016/05/25 13:15:50 UTC, 4 replies.
- about an exception when receiving data from kafka topic using Direct mode of Spark Streaming - posted by Alonso Isidoro Roman <al...@gmail.com> on 2016/05/25 13:19:34 UTC, 7 replies.
- Pros and Cons - posted by Aakash Basu <ra...@gmail.com> on 2016/05/25 15:34:17 UTC, 13 replies.
- feedback on dataset api explode - posted by Koert Kuipers <ko...@tresata.com> on 2016/05/25 15:49:46 UTC, 6 replies.
- Accumulators displayed in SparkUI in 1.4.1? - posted by Daniel Barclay <da...@gmail.com> on 2016/05/25 15:57:25 UTC, 1 replies.
- sparkApp on standalone/local mode with multithreading - posted by sujeet jog <su...@gmail.com> on 2016/05/25 16:49:42 UTC, 0 replies.
- Preference and confidence in ALS implicit preferences output? - posted by edezhath <ra...@gmail.com> on 2016/05/25 16:50:42 UTC, 1 replies.
- GraphFrame graph partitioning - posted by rohit13k <ro...@gmail.com> on 2016/05/25 16:52:23 UTC, 0 replies.
- The 7th and Largest Spark Summit is less than 2 weeks away! - posted by Scott walent <sc...@gmail.com> on 2016/05/25 18:18:09 UTC, 0 replies.
- unsure how to create 2 outputs from spark-sql udf expression - posted by Koert Kuipers <ko...@tresata.com> on 2016/05/25 20:11:23 UTC, 6 replies.
- Spark UI doesn't give visibility on which stage job actually failed (due to lazy eval nature) - posted by Nirav Patel <np...@xactlycorp.com> on 2016/05/26 00:28:42 UTC, 4 replies.
- User impersonation with Kerberos and Delegation tokens - posted by Sudarshan Rangarajan <tr...@gmail.com> on 2016/05/26 00:49:15 UTC, 0 replies.
- Kafka connection logs in Spark - posted by "Mail.com" <pr...@mail.com> on 2016/05/26 02:41:13 UTC, 3 replies.
- Re: Release Announcement: XGBoost4J - Portable Distributed XGBoost in Spark, Flink and Dataflow - posted by Selvam Raman <se...@gmail.com> on 2016/05/26 03:06:56 UTC, 0 replies.
- How to run large Hive queries in PySpark 1.2.1 - posted by Nikolay Voronchikhin <nv...@gmail.com> on 2016/05/26 08:10:42 UTC, 1 replies.
- Does decimal(6,-2) exists on purpose? - posted by Ofir Manor <of...@equalum.io> on 2016/05/26 08:51:14 UTC, 0 replies.
- HiveContext standalone => without a Hive metastore - posted by Gerard Maas <ge...@gmail.com> on 2016/05/26 09:28:18 UTC, 8 replies.
- System.exit in local mode ? - posted by yael aharon <ya...@gmail.com> on 2016/05/26 12:49:04 UTC, 0 replies.
- Apache Spark Video Processing from NFS Shared storage: Advise needed - posted by mobcdi <b0...@student.itb.ie> on 2016/05/26 13:53:04 UTC, 0 replies.
- save RDD of Avro GenericRecord as parquet throws UnsupportedOperationException - posted by "Govindasamy, Nagarajan" <ng...@turbine.com> on 2016/05/26 13:55:08 UTC, 3 replies.
- Spark Job Execution halts during shuffle... - posted by Priya Ch <le...@gmail.com> on 2016/05/26 14:40:52 UTC, 2 replies.
- JDBC Dialect for saving DataFrame into Vertica Table - posted by Aaron Ilovici <ai...@wayfair.com> on 2016/05/26 15:08:26 UTC, 3 replies.
- Distributed matrices with column counts represented by Int (rather than Long) - posted by Phillip Henry <lo...@gmail.com> on 2016/05/26 16:06:05 UTC, 0 replies.
- Subtract two DataFrames is not working - posted by Gurusamy Thirupathy <th...@gmail.com> on 2016/05/26 16:44:32 UTC, 1 replies.
- Re: List of questios about spark - posted by Ian <ps...@gmail.com> on 2016/05/26 17:57:26 UTC, 1 replies.
- Re: Problem instantiation of HiveContext - posted by Ian <ps...@gmail.com> on 2016/05/26 18:13:26 UTC, 0 replies.
- Spark input size when filtering on parquet files - posted by Dennis Hunziker <de...@gmail.com> on 2016/05/26 20:45:16 UTC, 1 replies.
- Insert into JDBC - posted by Andrés Ivaldi <ia...@gmail.com> on 2016/05/26 22:02:08 UTC, 4 replies.
- Sample scala program using CMU Sphinx4 - posted by Vajra L <va...@gmail.com> on 2016/05/27 04:09:17 UTC, 0 replies.
- Spark Streaming: Combine MLlib Prediction and Features on Dstreams - posted by obaidul karim <ob...@gmail.com> on 2016/05/27 04:33:27 UTC, 10 replies.
- submitMissingTasks - serialize throws StackOverflow exception - posted by Michel Hubert <mi...@phact.nl> on 2016/05/27 06:55:46 UTC, 0 replies.
- Logistic Regression in Spark Streaming - posted by kundan kumar <ii...@gmail.com> on 2016/05/27 07:09:24 UTC, 2 replies.
- problem about RDD map and then saveAsTextFile - posted by Reminia Scarlet <re...@gmail.com> on 2016/05/27 09:53:45 UTC, 1 replies.
- DIMSUM among 550k objects on AWS Elastic Map Reduce fails with OOM errors - posted by nmoretto <ni...@tyk.li> on 2016/05/27 10:38:28 UTC, 0 replies.
- GraphX Java API - posted by "Kumar, Abhishek (US - Bengaluru)" <ab...@deloitte.com> on 2016/05/27 10:58:43 UTC, 10 replies.
- pyspark.GroupedData.agg works incorrectly when one column is aggregated twice? - posted by Andrew Vykhodtsev <yo...@gmail.com> on 2016/05/27 11:28:00 UTC, 0 replies.
- Python memory included YARN-monitored memory? - posted by Mike Sukmanowsky <mi...@gmail.com> on 2016/05/27 14:11:43 UTC, 0 replies.
- JDBC Create Table - posted by Andrés Ivaldi <ia...@gmail.com> on 2016/05/27 14:37:00 UTC, 2 replies.
- I'm pretty sure this is a Dataset bug - posted by Tim Gautier <ti...@gmail.com> on 2016/05/27 15:24:51 UTC, 8 replies.
- Spark_API_Copy_From_Edgenode - posted by Ajay Chander <it...@gmail.com> on 2016/05/27 16:27:55 UTC, 1 replies.
- Need some clarification on this diagram of mine depicting Hive on Spark engine in yarn-client mode - posted by Mich Talebzadeh <mi...@gmail.com> on 2016/05/27 17:11:35 UTC, 0 replies.
- Undocumented left join constraint? - posted by Tim Gautier <ti...@gmail.com> on 2016/05/27 20:01:45 UTC, 4 replies.
- Range partition for parquet file? - posted by Rex Xiong <by...@gmail.com> on 2016/05/27 20:09:30 UTC, 0 replies.
- Spark Streaming - Is window() caching DStreams? - posted by Marco Platania <ma...@yahoo.it.INVALID> on 2016/05/27 20:16:35 UTC, 1 replies.
- local Vs Standalonecluster production deployment - posted by sujeet jog <su...@gmail.com> on 2016/05/28 15:42:58 UTC, 13 replies.
- Multinomial regression with spark.ml version of LogisticRegression - posted by Stephen Boesch <ja...@gmail.com> on 2016/05/28 16:06:32 UTC, 7 replies.
- join function in a loop - posted by heri wijayanto <he...@gmail.com> on 2016/05/28 22:27:42 UTC, 7 replies.
- Accessing s3a files from Spark - posted by Mayuresh Kunjir <ma...@cs.duke.edu> on 2016/05/29 21:55:00 UTC, 9 replies.
- Bulk loading Serialized RDD into Hbase throws KryoException - IndexOutOfBoundsException - posted by Nirav Patel <np...@xactlycorp.com> on 2016/05/29 23:26:54 UTC, 6 replies.
- G1 GC takes too much time - posted by condor join <sp...@outlook.com> on 2016/05/30 01:15:26 UTC, 1 replies.
- 答复: G1 GC takes too much time - posted by condor join <sp...@outlook.com> on 2016/05/30 02:17:11 UTC, 1 replies.
- 回复：答复: G1 GC takes too much time - posted by Sea <26...@qq.com> on 2016/05/30 02:33:01 UTC, 0 replies.
- Preview release of Spark 2.0 - posted by charles li <ch...@gmail.com> on 2016/05/30 05:20:13 UTC, 0 replies.
- Bug of PolynomialExpansion ? - posted by Jeff Zhang <zj...@gmail.com> on 2016/05/30 05:37:16 UTC, 1 replies.
- Query related to spark cluster - posted by "Kumar, Saurabh 5. (Nokia - IN/Bangalore)" <sa...@nokia.com> on 2016/05/30 06:38:03 UTC, 4 replies.
- Running glm in sparkR (data pre-processing step) - posted by Abhishek Anand <ab...@gmail.com> on 2016/05/30 09:06:17 UTC, 3 replies.
- Re: JDBC Cluster - posted by Ian <ps...@gmail.com> on 2016/05/30 09:15:28 UTC, 1 replies.
- Re: Launch Spark shell using differnt python version - posted by Eike von Seggern <ei...@sevenval.com> on 2016/05/30 09:20:00 UTC, 0 replies.
- Can we use existing R model in Spark - posted by Neha Mehta <ne...@gmail.com> on 2016/05/30 10:21:19 UTC, 4 replies.
- DAG of Spark Sort application spanning two jobs - posted by alvarobrandon <al...@gmail.com> on 2016/05/30 10:48:00 UTC, 0 replies.
- FAILED_TO_UNCOMPRESS Error - Spark 1.3.1 - posted by Prashant Singh Thakur <pr...@impetus.co.in> on 2016/05/30 10:51:26 UTC, 1 replies.
- Secondary Indexing? - posted by Michael Segel <ms...@hotmail.com> on 2016/05/30 16:08:20 UTC, 4 replies.
- can not use udf in hivethriftserver2 - posted by 喜之郎 <25...@qq.com> on 2016/05/30 16:25:29 UTC, 1 replies.
- Window Operation on Dstream Fails - posted by vinay453 <vi...@gmail.com> on 2016/05/30 17:42:26 UTC, 0 replies.
- Does Spark support updates or deletes on underlying Hive tables - posted by Ashok Kumar <as...@yahoo.com.INVALID> on 2016/05/30 18:37:36 UTC, 1 replies.
- Spark Streaming heap space out of memory - posted by "christian.dancuart@rbc.com" <ch...@rbc.com> on 2016/05/30 18:51:02 UTC, 3 replies.
- Spark + Kafka processing trouble - posted by Malcolm Lockyer <ma...@hapara.com> on 2016/05/31 01:45:36 UTC, 12 replies.
- equvalent beewn join sql and data frame - posted by pseudo oduesp <ps...@gmail.com> on 2016/05/31 03:26:13 UTC, 2 replies.
- Spark SQL Errors - posted by ayan guha <gu...@gmail.com> on 2016/05/31 05:02:31 UTC, 4 replies.
- Fwd: User finding issue in Spark Thrift server - posted by Radhika Kothari <ra...@gmail.com> on 2016/05/31 06:12:28 UTC, 2 replies.
- 回复： can not use udf in hivethriftserver2 - posted by 喜之郎 <25...@qq.com> on 2016/05/31 06:36:21 UTC, 0 replies.
- Behaviour of RDD sampling - posted by pbaier <pa...@zalando.de> on 2016/05/31 07:39:27 UTC, 5 replies.
- Spark Thrift Server run job as hive user - posted by Radhika Kothari <ra...@gmail.com> on 2016/05/31 08:01:16 UTC, 4 replies.
- Compute the global rank of the column - posted by "Dai, Kevin" <yu...@paypal.com.INVALID> on 2016/05/31 09:58:17 UTC, 0 replies.
- spark.hadoop.dfs.replication parameter not working for kafka-spark streaming - posted by Abhishek Anand <ab...@gmail.com> on 2016/05/31 10:33:02 UTC, 1 replies.
- processing twitter data - posted by Ashok Kumar <as...@yahoo.com.INVALID> on 2016/05/31 12:04:25 UTC, 0 replies.
- Running R codes in sparkR - posted by Arunkumar Pillai <ar...@gmail.com> on 2016/05/31 12:46:00 UTC, 2 replies.
- Re: Splitting RDD to exact number of partitions - posted by Maciej Sokołowski <ma...@gmail.com> on 2016/05/31 13:13:52 UTC, 8 replies.
- java.io.FileNotFoundException - posted by kishore kumar <ak...@gmail.com> on 2016/05/31 14:30:08 UTC, 0 replies.
- About a problem when mapping a file located within a HDFS vmware cdh-5.7 image - posted by Alonso <al...@gmail.com> on 2016/05/31 16:11:24 UTC, 3 replies.
- how to get file name of record being reading in spark - posted by Vikash Kumar <vi...@gmail.com> on 2016/05/31 17:32:10 UTC, 2 replies.
- Recommended way to close resources in a Spark streaming application - posted by Mohammad Tariq <do...@gmail.com> on 2016/05/31 17:33:03 UTC, 0 replies.
- Debug spark jobs on Intellij - posted by Marcelo Oikawa <ma...@webradar.com> on 2016/05/31 20:18:39 UTC, 4 replies.
- Protobuf class not found exception - posted by Nikhil Goyal <no...@gmail.com> on 2016/05/31 22:26:50 UTC, 1 replies.
- Map tuple to case class in Dataset - posted by Tim Gautier <ti...@gmail.com> on 2016/05/31 23:17:12 UTC, 1 replies.