user@spark.apache.org, 2017-12

You are viewing a plain text version of this content. The canonical link for it is here.

- Re: [Spark streaming] No assigned partition error during seek - posted by venkat <me...@gmail.com> on 2017/12/01 01:15:10 UTC, 3 replies.
- Re: Writing files to s3 with out temporary directory - posted by Steve Loughran <st...@hortonworks.com> on 2017/12/01 12:33:12 UTC, 0 replies.
- Re: Getting Message From Structured Streaming Format Kafka - posted by Daniel de Oliveira Mantovani <da...@gmail.com> on 2017/12/01 21:52:12 UTC, 0 replies.
- What should LivyUrl be set to when running locally? - posted by kant kodali <ka...@gmail.com> on 2017/12/01 23:28:59 UTC, 1 replies.
- Re: NLTK with Spark Streaming - posted by ashish rawat <dc...@gmail.com> on 2017/12/02 02:45:10 UTC, 0 replies.
- Is Databricks REST API open source ? - posted by kant kodali <ka...@gmail.com> on 2017/12/03 04:29:24 UTC, 2 replies.
- Recommended way to serialize Hadoop Writables' in Spark - posted by pradeepbaji <pr...@gmail.com> on 2017/12/03 05:36:51 UTC, 2 replies.
- Question on using pseudo columns in spark jdbc options - posted by "☼ R Nair (रविशंकर नायर)" <ra...@gmail.com> on 2017/12/03 06:39:40 UTC, 2 replies.
- spark datatypes - posted by David Hodefi <da...@gmail.com> on 2017/12/03 14:12:18 UTC, 0 replies.
- Dynamic Resource allocation in Spark Streaming - posted by Sourav Mazumder <so...@gmail.com> on 2017/12/03 17:31:38 UTC, 2 replies.
- Add snappy support for spark in Windows - posted by Junfeng Chen <da...@gmail.com> on 2017/12/04 03:30:23 UTC, 4 replies.
- learning Spark - posted by Manuel Sopena Ballesteros <ma...@garvan.org.au> on 2017/12/04 03:48:56 UTC, 5 replies.
- Buffer/cache exhaustion Spark standalone inside a Docker container - posted by Stein Welberg <st...@onegini.com> on 2017/12/04 07:58:33 UTC, 1 replies.
- Re: Programmatically get status of job (WAITING/RUNNING) - posted by bsikander <be...@gmail.com> on 2017/12/04 09:06:07 UTC, 11 replies.
- Re: Access to Applications metrics - posted by Nick Dimiduk <nd...@gmail.com> on 2017/12/04 23:53:00 UTC, 3 replies.
- How to persistent database/table created in sparkSession - posted by 163 <he...@163.com> on 2017/12/05 07:22:15 UTC, 1 replies.
- Support for storing date time fields as TIMESTAMP_MILLIS(INT64) - posted by Rahul Raj <ra...@option3consulting.com> on 2017/12/05 12:17:05 UTC, 0 replies.
- Apache Spark 2.3 and Apache ORC 1.4 finally - posted by Dongjoon Hyun <do...@gmail.com> on 2017/12/05 17:47:47 UTC, 0 replies.
- Do I need to do .collect inside forEachRDD - posted by kant kodali <ka...@gmail.com> on 2017/12/05 20:35:44 UTC, 10 replies.
- How to export the Spark SQL jobs from the HiveThriftServer2 - posted by wenxing zheng <we...@gmail.com> on 2017/12/06 06:08:43 UTC, 1 replies.
- Spark job only starts tasks on a single node - posted by Ji Yan <ji...@drive.ai> on 2017/12/06 06:45:24 UTC, 5 replies.
- unable to connect to connect to cluster 2.2.0 - posted by Imran Rajjad <ra...@gmail.com> on 2017/12/06 07:45:45 UTC, 2 replies.
- [ML] LogisticRegression and dataset's standardization before training - posted by Filipp Zhinkin <fi...@gmail.com> on 2017/12/06 10:13:22 UTC, 0 replies.
- A possible bug? Must call persist to make code run - posted by kwunlyou <zj...@gmail.com> on 2017/12/06 14:58:30 UTC, 0 replies.
- sparkSession.sql("sql query") vs df.sqlContext().sql(this.query) ? - posted by kant kodali <ka...@gmail.com> on 2017/12/06 18:07:13 UTC, 1 replies.
- Explode schema name question - posted by tj5527 <tj...@protonmail.com> on 2017/12/06 23:14:23 UTC, 0 replies.
- Json Parsing. - posted by satyajit vegesna <sa...@gmail.com> on 2017/12/06 23:39:05 UTC, 3 replies.
- Spark ListenerBus - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2017/12/07 05:14:10 UTC, 0 replies.
- Re: LDA and evaluating topic number - posted by Stephen Boesch <ja...@gmail.com> on 2017/12/07 08:15:59 UTC, 0 replies.
- Re: How to write dataframe to kafka topic in spark streaming application using pyspark other than collect? - posted by umargeek <um...@gmail.com> on 2017/12/07 18:20:56 UTC, 0 replies.
- Streaming Analytics/BI tool to connect Spark SQL - posted by umargeek <um...@gmail.com> on 2017/12/07 18:27:03 UTC, 1 replies.
- Best way of shipping self-contained pyspark jobs with 3rd-party dependencies - posted by Sergey Zhemzhitsky <sz...@gmail.com> on 2017/12/07 20:46:04 UTC, 1 replies.
- Row Encoder For DataSet - posted by Sandip Mehta <sa...@gmail.com> on 2017/12/08 03:51:58 UTC, 5 replies.
- RDD[internalRow] -> DataSet - posted by satyajit vegesna <sa...@gmail.com> on 2017/12/08 04:25:16 UTC, 2 replies.
- [Spark SQL]: Dataset can not map into Dataset in java - posted by Himasha de Silva <hi...@gmail.com> on 2017/12/08 04:45:56 UTC, 1 replies.
- UDF issues with spark - posted by "Afshin, Bardia" <ba...@changehealthcare.com> on 2017/12/08 19:54:22 UTC, 1 replies.
- ML Transformer: create feature that uses multiple columns - posted by davideanastasia <da...@gmail.com> on 2017/12/09 11:41:31 UTC, 2 replies.
- JDBC to hive batch use case in spark - posted by Hokam Singh Chauhan <ho...@gmail.com> on 2017/12/09 12:02:06 UTC, 2 replies.
- Structured Streaming + Kafka 0.10. connectors + valueDecoder and messageHandler with python - posted by salemi <al...@udo.edu> on 2017/12/09 18:07:20 UTC, 0 replies.
- Spark + AI Summit CfP Open - posted by Jules Damji <dm...@comcast.net> on 2017/12/09 21:35:20 UTC, 0 replies.
- Save hive table from spark in hive 2.1.1 - posted by konu <al...@gmail.com> on 2017/12/09 22:55:59 UTC, 1 replies.
- [CFP] DataWorks Summit Europe 2018 - Call for abstracts - posted by Yanbo Liang <yb...@gmail.com> on 2017/12/10 00:21:21 UTC, 0 replies.
- Re: Weight column values not used in Binary Logistic Regression Summary - posted by Sea aj <sa...@gmail.com> on 2017/12/10 05:27:04 UTC, 0 replies.
- pyspark + from_json(col("col_name"), schema) returns all null - posted by salemi <al...@udo.edu> on 2017/12/10 06:33:34 UTC, 2 replies.
- Re: Save hive table from spark in hive 2.1.0 - posted by Alejandro Reina <al...@gmail.com> on 2017/12/10 11:17:44 UTC, 2 replies.
- Infer JSON schema in structured streaming Kafka. - posted by satyajit vegesna <sa...@gmail.com> on 2017/12/11 02:28:58 UTC, 6 replies.
- Loading a spark dataframe column into T-Digest using java - posted by Himasha de Silva <hi...@gmail.com> on 2017/12/11 06:27:00 UTC, 1 replies.
- Why Spark 2.2.1 still bundles old Hive jars? - posted by An Qin <aq...@qilinsoft.com> on 2017/12/11 06:43:56 UTC, 1 replies.
- Spark Structured Streaming how to read data from AWS SQS - posted by Bogdan Cojocar <bo...@gmail.com> on 2017/12/11 15:15:20 UTC, 0 replies.
- unsubscribe - posted by Malcolm Croucher <ma...@gmail.com> on 2017/12/11 17:30:30 UTC, 6 replies.
- Writing a UDF that works with an Interval in PySpark - posted by Daniel Haviv <da...@gmail.com> on 2017/12/11 18:15:32 UTC, 0 replies.
- Joining streaming data with static table data. - posted by satyajit vegesna <sa...@gmail.com> on 2017/12/12 00:59:10 UTC, 3 replies.
- Json to csv - posted by Prabha K <pr...@gmail.com> on 2017/12/12 06:44:28 UTC, 1 replies.
- How Fault Tolerance is achieved in Spark ?? - posted by Ni...@ril.com on 2017/12/12 06:51:08 UTC, 1 replies.
- pyspark.sql.utils.AnalysisException: u'Left outer/semi/anti joins with a streaming DataFrame/Dataset on the right is not supported; - posted by salemi <al...@udo.edu> on 2017/12/12 06:55:21 UTC, 0 replies.
- Union of RDDs Hung - posted by Vikash Pareek <vi...@gmail.com> on 2017/12/12 08:02:02 UTC, 1 replies.
- Unsubscribe - posted by Olivier MATRAT <ol...@hotmail.com> on 2017/12/12 17:36:34 UTC, 0 replies.
- How do I save the dataframe data as a pdf file? - posted by shyla deshpande <de...@gmail.com> on 2017/12/12 19:12:45 UTC, 3 replies.
- Access Array StructField inside StructType. - posted by satyajit vegesna <sa...@gmail.com> on 2017/12/13 03:18:21 UTC, 0 replies.
- Spark loads data from HDFS or S3 - posted by Philip Lee <ph...@gmail.com> on 2017/12/13 08:39:52 UTC, 2 replies.
- Determine Cook's distance / influential data points - posted by Richard Siebeling <rs...@gmail.com> on 2017/12/13 13:18:18 UTC, 0 replies.
- Spark Streaming with Confluent - posted by Arkadiusz Bicz <ar...@gmail.com> on 2017/12/13 17:05:29 UTC, 1 replies.
- Apache Spark documentation on mllib's Kmeans doesn't jibe. - posted by Michael Segel <ms...@hotmail.com> on 2017/12/13 17:15:54 UTC, 0 replies.
- Different behaviour when querying a spark DataFrame from dynamodb - posted by Bogdan Cojocar <bo...@gmail.com> on 2017/12/13 17:28:20 UTC, 0 replies.
- Why do I see five attempts on my Spark application - posted by Toy <no...@gmail.com> on 2017/12/13 19:21:00 UTC, 5 replies.
- Re: Apache Spark documentation on mllib's Kmeans doesn't jibe. - posted by Scott Reynolds <sr...@twilio.com.INVALID> on 2017/12/13 21:33:32 UTC, 0 replies.
- is Union or Join Supported for Spark Structured Streaming Queries in 2.2.0? - posted by kant kodali <ka...@gmail.com> on 2017/12/13 22:16:16 UTC, 1 replies.
- How to control logging in testing package com.holdenkarau.spark.testing. - posted by Marco Mistroni <mm...@gmail.com> on 2017/12/13 22:26:08 UTC, 0 replies.
- spark streaming with flume: cannot assign requested address error - posted by Junfeng Chen <da...@gmail.com> on 2017/12/14 02:07:30 UTC, 0 replies.
- bulk upsert data batch from Kafka dstream into Postgres db - posted by salemi <al...@udo.edu> on 2017/12/14 05:52:28 UTC, 4 replies.
- Why this code is errorfull - posted by Soheil Pourbafrani <so...@gmail.com> on 2017/12/14 07:00:13 UTC, 1 replies.
- Fwd: Feature Generation for Large datasets composed of many time series - posted by ju...@free.fr on 2017/12/14 08:28:12 UTC, 0 replies.
- Feature generation / aggregate functions / timeseries - posted by ju...@free.fr on 2017/12/14 08:30:50 UTC, 0 replies.
- cosine similarity in Java Spark - posted by Donni Khan <pr...@googlemail.com> on 2017/12/14 10:39:34 UTC, 0 replies.
- cosine similarity implementation in Java Spark - posted by Donni Khan <pr...@googlemail.com> on 2017/12/14 11:26:41 UTC, 0 replies.
- Spark multithreaded job submission from driver - posted by Michael Artz <mi...@gmail.com> on 2017/12/14 15:02:01 UTC, 0 replies.
- flatMap() returning large class - posted by Don Drake <do...@gmail.com> on 2017/12/14 18:20:19 UTC, 4 replies.
- Re: Feature generation / aggregate functions / timeseries - posted by Georg Heiler <ge...@gmail.com> on 2017/12/14 18:40:06 UTC, 1 replies.
- kinesis throughput problems - posted by Jeremy Kelley <jk...@carbonblack.com> on 2017/12/14 19:03:09 UTC, 2 replies.
- Recompute Spark outputs intelligently - posted by Ashwin Raju <th...@gmail.com> on 2017/12/15 08:00:10 UTC, 0 replies.
- Several Aggregations on a window function - posted by Julien CHAMP <jc...@tellmeplus.com> on 2017/12/15 10:32:30 UTC, 5 replies.
- Using UDF compiled with Janino in Spark - posted by Michael Shtelma <ms...@gmail.com> on 2017/12/15 14:02:27 UTC, 0 replies.
- Please Help with DecisionTree/FeatureIndexer - posted by Marco Mistroni <mm...@gmail.com> on 2017/12/15 22:26:18 UTC, 5 replies.
- NASA CDF files in Spark - posted by Christopher Piggott <cp...@gmail.com> on 2017/12/16 02:33:01 UTC, 2 replies.
- Given a Avro Schema object is there a way to get StructType in Java? - posted by kant kodali <ka...@gmail.com> on 2017/12/16 06:32:02 UTC, 0 replies.
- Windows10 + pyspark + ipython + csv file loading with timestamps - posted by Esa Heikkinen <he...@student.tut.fi> on 2017/12/16 10:04:34 UTC, 1 replies.
- Stateful Aggregation Using flatMapGroupsWithState - posted by Sandip Mehta <sa...@gmail.com> on 2017/12/16 11:09:40 UTC, 0 replies.
- How to...UNION ALL of two SELECTs over different data sources in parallel? - posted by Jacek Laskowski <ja...@japila.pl> on 2017/12/16 11:40:03 UTC, 2 replies.
- Lucene Index with Spark Cassandra - posted by Junaid Nasir <jn...@an10.io> on 2017/12/17 10:02:34 UTC, 0 replies.
- spark + SalesForce SSL HandShake Issue - posted by "Kali.tummala@gmail.com" <Ka...@gmail.com> on 2017/12/17 16:21:58 UTC, 0 replies.
- ECOS Spark Integration - posted by Debasish Das <de...@gmail.com> on 2017/12/18 01:25:51 UTC, 0 replies.
- SANSA 0.3 (Scalable Semantic Analytics Stack) Released - posted by Hajira Jabeen <ha...@gmail.com> on 2017/12/18 09:21:11 UTC, 2 replies.
- Spark - Livy - Hive Table User - posted by Sudha KS <Su...@fuzzylogix.com> on 2017/12/18 09:59:25 UTC, 0 replies.
- How to properly execute `foreachPartition` in Spark 2.2 - posted by Liana Napalkova <li...@eurecat.org> on 2017/12/18 14:45:44 UTC, 7 replies.
- Getting multiple regression metrics at once - posted by OBones <ob...@free.fr> on 2017/12/18 16:35:18 UTC, 0 replies.
- Mapping words to vector sparkml CountVectorizerModel - posted by Sandeep Nemuri <nh...@gmail.com> on 2017/12/18 19:43:06 UTC, 0 replies.
- Help Required on Spark - Convert DataFrame to List with out using collect - posted by Sunitha Chennareddy <ch...@gmail.com> on 2017/12/19 03:55:20 UTC, 7 replies.
- What does Blockchain technology mean for Big Data? And how Hadoop/Spark will play role with it? - posted by Gaurav1809 <ga...@gmail.com> on 2017/12/19 04:56:50 UTC, 3 replies.
- NullPointerException while reading a column from the row - posted by Anurag Sharma <an...@logistimo.com> on 2017/12/19 09:23:14 UTC, 1 replies.
- Re: /tmp fills up to 100GB when using a window function - posted by Vadim Semenov <va...@datadoghq.com> on 2017/12/19 14:45:26 UTC, 3 replies.
- Spark error while trying to spark.read.json() - posted by satyajit vegesna <sa...@gmail.com> on 2017/12/20 01:42:21 UTC, 1 replies.
- Can spark shuffle leverage Alluxio to abtain higher stability？ - posted by chopinxb <ch...@gmail.com> on 2017/12/20 09:46:31 UTC, 6 replies.
- Fwd: ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM - posted by Vishal Verma <vi...@exadatum.com> on 2017/12/20 11:29:04 UTC, 0 replies.
- keep sparkContext alive and wait for next job just like spark-shell - posted by CondyZhou <zh...@126.com> on 2017/12/21 03:31:40 UTC, 0 replies.
- Exception in Shutdown-thread, bad file descriptor - posted by Noorul Islam Kamal Malmiyoda <no...@noorul.com> on 2017/12/21 04:22:44 UTC, 0 replies.
- AM restart in a other node makes SparkSQL jobs into a state of feign death - posted by Bang Xiao <ch...@gmail.com> on 2017/12/21 07:02:03 UTC, 0 replies.
- AM restart in a other node make SparkSQL job into a state of feign death - posted by Bang Xiao <ch...@gmail.com> on 2017/12/21 07:03:19 UTC, 0 replies.
- Reading data from OpenTSDB or KairosDB - posted by marko <ma...@nissatech.com> on 2017/12/21 11:27:17 UTC, 2 replies.
- Anyone know where to find independent contractors in New York? - posted by "Richard L. Burton III" <mr...@gmail.com> on 2017/12/21 16:34:32 UTC, 2 replies.
- Spark Streaming to REST API - posted by ashish rawat <dc...@gmail.com> on 2017/12/21 20:32:08 UTC, 2 replies.
- Fwd: [pyspark][MLlib] Getting WARN FPGrowth: Input data is not cached for cached data - posted by Anu B Nair <an...@gmail.com> on 2017/12/22 07:44:24 UTC, 0 replies.
- Passing an array of more than 22 elements in a UDF - posted by Aakash Basu <aa...@gmail.com> on 2017/12/22 09:24:57 UTC, 5 replies.
- How to do stop streaming before the application got killed - posted by Toy <no...@gmail.com> on 2017/12/22 17:56:40 UTC, 0 replies.
- Re: [E] How to do stop streaming before the application got killed - posted by "Rastogi, Pankaj" <pa...@verizon.com> on 2017/12/22 18:14:21 UTC, 0 replies.
- Storage at node or executor level - posted by Jean Georges Perrin <jg...@jgp.net> on 2017/12/23 06:29:37 UTC, 0 replies.
- How to use schema from one of the columns of a dataset to parse another column and create a flattened dataset using Spark Streaming 2.2.0? - posted by kant kodali <ka...@gmail.com> on 2017/12/23 11:50:43 UTC, 0 replies.
- Structured streaming checkpointing - posted by puneetloya <pu...@gmail.com> on 2017/12/23 19:46:14 UTC, 0 replies.
- Re: Custom Data Source for getting data from Rest based services - posted by Subarna Bhattacharyya <su...@climformatics.com> on 2017/12/24 01:56:47 UTC, 2 replies.
- Which kafka client to use with spark streaming - posted by Serkan TAS <Se...@enerjisa.com> on 2017/12/25 08:16:57 UTC, 3 replies.
- Spark Docker - posted by sujeet jog <su...@gmail.com> on 2017/12/25 08:54:10 UTC, 1 replies.
- Apache Spark - Structured Streaming graceful shutdown - posted by M Singh <ma...@yahoo.com.INVALID> on 2017/12/25 21:19:45 UTC, 4 replies.
- Apache Spark - Structured Streaming from file - checkpointing - posted by M Singh <ma...@yahoo.com.INVALID> on 2017/12/25 21:24:36 UTC, 1 replies.
- Apache Spark - (2.2.0) - window function for DataSet - posted by M Singh <ma...@yahoo.com.INVALID> on 2017/12/25 22:15:12 UTC, 1 replies.
- Is there a way to make the broker merge big result set faster? - posted by Mu Kong <ko...@gmail.com> on 2017/12/26 01:46:51 UTC, 0 replies.
- [Structured Streaming] Reuse computation result - posted by Shu Li Zheng <ne...@gmail.com> on 2017/12/26 10:32:10 UTC, 1 replies.
- Spark 2.2.1 worker invocation - posted by Christopher Piggott <cp...@gmail.com> on 2017/12/26 16:00:56 UTC, 1 replies.
- Problem in Spark-Kafka Connector - posted by Sitakant Mishra <si...@gmail.com> on 2017/12/26 19:34:34 UTC, 1 replies.
- Standalone Cluster: ClassNotFound org.apache.kafka.common.serialization.ByteArrayDeserializer - posted by Geoff Von Allmen <ge...@ibleducation.com> on 2017/12/26 23:08:12 UTC, 3 replies.
- Spark and neural networks - posted by Esa Heikkinen <es...@student.tut.fi> on 2017/12/27 14:57:22 UTC, 0 replies.
- Partition Dataframe Using UDF On Partition Column - posted by Richard Primera <ri...@woombatcg.com> on 2017/12/27 16:37:31 UTC, 0 replies.
- Pyspark and searching items from data structures - posted by Esa Heikkinen <es...@student.tut.fi> on 2017/12/28 12:33:43 UTC, 0 replies.
- Spark on EMR suddenly stalling - posted by Jeroen Miller <bl...@gmail.com> on 2017/12/28 16:06:14 UTC, 12 replies.
- Cascading Spark Structured streams - posted by Eric Dain <er...@gmail.com> on 2017/12/28 22:14:44 UTC, 0 replies.
- Custom line/record delimiter - posted by sk skk <sp...@gmail.com> on 2017/12/29 17:19:29 UTC, 0 replies.
- Subqueries - posted by "Lalwani, Jayesh" <Ja...@capitalone.com> on 2017/12/29 21:02:27 UTC, 2 replies.
- Converting binary files - posted by Christopher Piggott <cp...@gmail.com> on 2017/12/30 20:44:07 UTC, 0 replies.
- Apache Spark - Using withWatermark for DataSets - posted by M Singh <ma...@yahoo.com.INVALID> on 2017/12/30 22:40:08 UTC, 0 replies.