user@spark.apache.org, 2019-03

You are viewing a plain text version of this content. The canonical link for it is here.

- error in sprark sql - posted by yuvraj singh <19...@gmail.com> on 2019/03/01 07:36:36 UTC, 1 replies.
- [Spark SQL]: sql.DataFrame.replace to accept regexp - posted by Nuno Silva <nu...@gmail.com> on 2019/03/01 10:21:35 UTC, 2 replies.
- spark.python.worker.memory VS spark.executor.pyspark.memory - posted by Andrey Dudin <du...@gmail.com> on 2019/03/01 13:24:38 UTC, 1 replies.
- Spark Streaming loading kafka source value column type - posted by oskarryn <os...@adaltas.com> on 2019/03/01 13:38:39 UTC, 0 replies.
- Re: to_avro and from_avro not working with struct type in spark 2.4 - posted by Gabor Somogyi <ga...@gmail.com> on 2019/03/01 14:54:14 UTC, 0 replies.
- Spark 2.4 Structured Streaming Kafka assign API polling same offsets - posted by Kristopher Kane <kk...@gmail.com> on 2019/03/01 15:10:06 UTC, 1 replies.
- Is there a way to validate the syntax of raw spark sql query? - posted by kant kodali <ka...@gmail.com> on 2019/03/01 16:44:02 UTC, 2 replies.
- Re: Spark on k8s - map persistentStorage for data spilling - posted by Matt Cheah <mc...@palantir.com> on 2019/03/01 19:30:08 UTC, 2 replies.
- SPARK Streaming Graphs - posted by Gourav Sengupta <go...@gmail.com> on 2019/03/01 20:47:26 UTC, 0 replies.
- updateBytesRead() - posted by swastik mittal <sm...@ncsu.edu> on 2019/03/02 04:30:42 UTC, 0 replies.
- Milliseconds in timestamp - posted by swastik mittal <sm...@ncsu.edu> on 2019/03/02 18:53:37 UTC, 0 replies.
- disable spark disk cache - posted by Andrey Dudin <du...@gmail.com> on 2019/03/03 21:46:58 UTC, 1 replies.
- Shuffle service with more than one executor - posted by Bruno Faria <br...@hotmail.com> on 2019/03/04 02:51:41 UTC, 0 replies.
- Spark SQL doesn't produce output while hive does - posted by mayangyang02 <ma...@imdada.cn> on 2019/03/04 06:18:05 UTC, 2 replies.
- spark df.write.partitionBy run very slow - posted by JF Chen <da...@gmail.com> on 2019/03/04 10:02:15 UTC, 9 replies.
- Timeout between driver and application master (Thrift Server) - posted by Jürgen Thomann <ju...@linfre.de> on 2019/03/04 12:49:19 UTC, 0 replies.
- Connect to hive 3 from spark - posted by Nicolas Paris <ni...@riseup.net> on 2019/03/04 13:58:43 UTC, 0 replies.
- Difference between One map vs multiple maps - posted by Yeikel <em...@yeikel.com> on 2019/03/05 03:13:02 UTC, 0 replies.
- [SQL] 64-bit hash function, and seeding - posted by Hu...@data61.csiro.au on 2019/03/05 04:30:31 UTC, 2 replies.
- subscribe - posted by Qian He <hq...@gmail.com> on 2019/03/05 05:48:37 UTC, 2 replies.
- Join selection - posted by Akhilanand <ak...@gmail.com> on 2019/03/05 07:20:04 UTC, 0 replies.
- - posted by Shyam P <sh...@gmail.com> on 2019/03/05 07:54:56 UTC, 2 replies.
- How to add more imports at the start of REPL - posted by Nuthan Reddy <nu...@sigmoidanalytics.com> on 2019/03/05 11:14:21 UTC, 2 replies.
- C++ script on Spark Cluster throws exit status 132 - posted by Mkal <di...@hotmail.com> on 2019/03/05 11:25:32 UTC, 0 replies.
- Why does Apache Spark Master shutdown when Zookeeper expires the session - posted by lokeshkumar <lo...@dataken.net> on 2019/03/05 13:02:12 UTC, 1 replies.
- [PySpark] TypeError: expected string or bytes-like object - posted by Thomas Ryck <tr...@norsys.fr> on 2019/03/05 15:08:45 UTC, 0 replies.
- [Kubernets] [SPARK-27061] Need to expose 4040 port on driver service - posted by Chandu Kavar <cc...@gmail.com> on 2019/03/05 15:46:53 UTC, 3 replies.
- "java.lang.AssertionError: assertion failed: Failed to get records for **** after polling for 180000" error - posted by JF Chen <da...@gmail.com> on 2019/03/06 03:08:29 UTC, 4 replies.
- Re: How to group dataframe year-wise and iterate through groups and send each year to dataframe to executor? - posted by Shyam P <sh...@gmail.com> on 2019/03/06 06:31:27 UTC, 0 replies.
- 4 Apache Events in 2019: DC Roadshow soon; next up Chicago, Las Vegas, and Berlin! - posted by Rich Bowen <rb...@apache.org> on 2019/03/06 14:00:23 UTC, 0 replies.
- Structured Streaming to Kafka Topic - posted by Pankaj Wahane <pa...@live.com> on 2019/03/06 16:58:58 UTC, 1 replies.
- PysPark date_add function suggestion - posted by William Creger <Cl...@Mscience.com> on 2019/03/06 18:51:29 UTC, 0 replies.
- spark structured streaming crash due to decompressing gzip file failure - posted by Lian Jiang <ji...@gmail.com> on 2019/03/07 05:58:22 UTC, 2 replies.
- Hadoop free spark on kubernetes => NoClassDefFound - posted by Sommer Tobias <To...@esolutions.de> on 2019/03/07 09:09:10 UTC, 0 replies.
- [SparkSQL, user-defined Hadoop, K8s] Hadoop free spark on kubernetes => NoClassDefFound - posted by Sommer Tobias <To...@esolutions.de> on 2019/03/07 09:23:57 UTC, 0 replies.
- Re: mapreduce.input.fileinputformat.split.maxsize not working for spark 2.4.0 - posted by Akshay Mendole <ak...@gmail.com> on 2019/03/07 14:47:45 UTC, 0 replies.
- Difference between 'cores' config params: spark submit on k8s - posted by Battini Lakshman <ba...@gmail.com> on 2019/03/07 21:53:15 UTC, 0 replies.
- A spark streaming problem about shuffle operation - posted by li...@itri.org.tw on 2019/03/08 08:55:32 UTC, 0 replies.
- Optimize tables used more than once: make dataframe persistent or save as parquet - posted by zjzzjz <ji...@gmail.com> on 2019/03/10 01:10:57 UTC, 0 replies.
- How to know if a machine in a Spark cluster 'participate's a job - posted by zjzzjz <ji...@gmail.com> on 2019/03/10 01:14:29 UTC, 0 replies.
- use rocksdb for spark structured streaming (SSS) - posted by Lian Jiang <ji...@gmail.com> on 2019/03/10 18:54:05 UTC, 4 replies.
- returning type of function that needs to be passed to method 'mapWithState' - posted by "shicheng31604@gmail.com" <sh...@gmail.com> on 2019/03/11 08:42:24 UTC, 0 replies.
- unsubscribe - posted by Byron Lee <by...@gmail.com> on 2019/03/11 17:40:55 UTC, 1 replies.
- read json and write into parquet in executors - posted by Lian Jiang <ji...@gmail.com> on 2019/03/12 02:52:21 UTC, 0 replies.
- [SHUFFLE]FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle - posted by wangfei <hz...@163.com> on 2019/03/12 06:19:21 UTC, 2 replies.
- Mutating broadcast variable from executors, any risks even if done in a thread-safe manner? - posted by "Jan Brabec (janbrabe)" <ja...@cisco.com> on 2019/03/12 09:39:25 UTC, 0 replies.
- [SPARK SQL] How to overwrite a Hive table with spark sql (SPARK2) - posted by lu...@china-inv.cn on 2019/03/12 11:13:47 UTC, 0 replies.
- K8s for spark-2.2.3 - posted by puneetloya <pu...@gmail.com> on 2019/03/12 18:49:13 UTC, 0 replies.
- Build spark source code with scala 2.11 - posted by swastik mittal <sm...@ncsu.edu> on 2019/03/12 23:26:06 UTC, 3 replies.
- SUBSCRIBE - posted by Anbazhagan Muthuramalingam <an...@gmail.com> on 2019/03/14 03:28:30 UTC, 0 replies.
- Windowing LAG function Usage in Spark2.2 Dataset scala - posted by anbu <an...@gmail.com> on 2019/03/14 03:55:19 UTC, 1 replies.
- Spark scala Date Usage - posted by anbu <an...@gmail.com> on 2019/03/14 03:59:47 UTC, 0 replies.
- Multiple context in one Driver - posted by Ido Friedman <id...@equalum.io> on 2019/03/14 06:37:37 UTC, 2 replies.
- Structured Streaming & Query Planning - posted by Paolo Platter <pa...@agilelab.it> on 2019/03/14 15:50:59 UTC, 4 replies.
- Yarn job is Stuck - posted by dimitris plakas <di...@gmail.com> on 2019/03/14 16:23:34 UTC, 1 replies.
- How does spark operate internally for an indivisual task? - posted by swastik mittal <sm...@ncsu.edu> on 2019/03/14 16:53:57 UTC, 0 replies.
- Masking username in Spark with regexp_replace and reverse functions - posted by Mich Talebzadeh <mi...@gmail.com> on 2019/03/16 17:39:07 UTC, 3 replies.
- Spark Streaming: schema mismatch using MicroBatchReader with columns pruning - posted by kineret M <ki...@gmail.com> on 2019/03/16 20:09:56 UTC, 0 replies.
- Spark ML on Python has short memory? - posted by Saif Addin <sa...@gmail.com> on 2019/03/16 20:49:02 UTC, 0 replies.
- what is the difference between udf execution and map(someLambda)? - posted by kant kodali <ka...@gmail.com> on 2019/03/17 18:41:48 UTC, 1 replies.
- Spark on Mesos broken on 2.4 ? - posted by Jorge Machado <jo...@me.com.INVALID> on 2019/03/18 06:49:32 UTC, 0 replies.
- pysphark sql filters regular expression double backslashes, resulting in incorrect results - posted by 李斌松 <li...@gmail.com> on 2019/03/18 07:51:01 UTC, 0 replies.
- Reuse broadcasted data frame in multiple query - posted by Lu Liu <li...@gmail.com> on 2019/03/18 09:28:16 UTC, 0 replies.
- Spark does not load all classes in fat jar - posted by Federico D'Ambrosio <fe...@gmail.com> on 2019/03/18 11:34:14 UTC, 3 replies.
- Spark Metrics : Job Remains In "Running" State - posted by "Jain, Abhishek 3. (Nokia - IN/Bangalore)" <ab...@nokia.com> on 2019/03/18 13:15:07 UTC, 0 replies.
- java.lang.NoSuchFieldError: HIVE_STATS_JDBC_TIMEOUT on EMR - posted by Daniel Zhang <ja...@hotmail.com> on 2019/03/18 15:46:41 UTC, 0 replies.
- Expecting 'type' to be present - posted by Jorge Machado <jo...@me.com.INVALID> on 2019/03/18 19:47:07 UTC, 0 replies.
- Spark - Hadoop custom filesystem service loading - posted by Jhon Anderson Cardenas Diaz <jh...@gmail.com> on 2019/03/18 20:17:13 UTC, 1 replies.
- Writing the contents of spark dataframe to Kafka with Spark 2.2 - posted by anna stax <an...@gmail.com> on 2019/03/18 21:06:46 UTC, 5 replies.
- Creating Hive Persistent view using Spark Sql defaults to Sequence File Format - posted by arun rajesh <ar...@gmail.com> on 2019/03/19 09:14:34 UTC, 0 replies.
- LocationStratgies.PreferFixed in Structured Streaming - posted by Subacini Balakrishnan <su...@gmail.com> on 2019/03/19 18:17:19 UTC, 0 replies.
- Re: Spark 2.2 Structured Streaming + Kinesis - posted by Gourav Sengupta <go...@gmail.com> on 2019/03/19 23:30:03 UTC, 0 replies.
- [HELP WANTED] Apache Zipkin (incubating) needs Spark gurus - posted by Andriy Redko <dr...@gmail.com> on 2019/03/21 00:16:07 UTC, 1 replies.
- How shall I configure the Spark executor memory size and the Alluxio worker memory size on a machine? - posted by u9g <lw...@163.com> on 2019/03/21 15:26:18 UTC, 0 replies.
- Spark streaming error - Query terminated with exception: assertion failed: Invalid batch: a#660,b#661L,c#662,d#663,,… 26 more fields != b#1291L - posted by kineret M <ki...@gmail.com> on 2019/03/21 17:40:53 UTC, 0 replies.
- Fwd: Cross Join - posted by asma zgolli <zg...@gmail.com> on 2019/03/21 17:46:28 UTC, 1 replies.
- Re: Manually reading parquet files. - posted by Ryan Blue <rb...@netflix.com.INVALID> on 2019/03/21 22:31:11 UTC, 1 replies.
- spark sql occer error - posted by "563280193@qq.com" <56...@qq.com> on 2019/03/22 07:39:19 UTC, 3 replies.
- Java Heap Space error - Spark ML - posted by KhajaAsmath Mohammed <md...@gmail.com> on 2019/03/22 15:19:24 UTC, 1 replies.
- writing a small csv to HDFS is super slow - posted by Lian Jiang <ji...@gmail.com> on 2019/03/22 21:34:29 UTC, 6 replies.
- How to control batch size while reading from hdfs files? - posted by kant kodali <ka...@gmail.com> on 2019/03/23 02:02:05 UTC, 0 replies.
- [spark context / spark sql] unexpected disk IO activity after spark job finished but spark context has not - posted by Chenghao <ch...@cs.umass.edu> on 2019/03/23 06:51:59 UTC, 1 replies.
- spark core / spark sql -- unexpected disk IO activity after all the spark tasks finished but spark context has not stopped. - posted by Chenghao <ch...@cs.umass.edu> on 2019/03/23 07:21:04 UTC, 0 replies.
- JavaRDD and WrappedArrays type iterate - posted by 1266 <10...@qq.com> on 2019/03/23 15:45:18 UTC, 0 replies.
- [spark sql performance] Only 1 executor to write output? - posted by Mike Chan <mi...@gmail.com> on 2019/03/23 20:10:43 UTC, 7 replies.
- Where does the Driver run? - posted by Pat Ferrel <pa...@occamsmachete.com> on 2019/03/23 21:12:53 UTC, 15 replies.
- Apache Spark Newsletter Issue 2 - posted by Ankur Gupta <an...@outlook.com> on 2019/03/23 22:12:50 UTC, 0 replies.
- How to support writeStream in data source v2 (spark 2.3.1)? - posted by kineret M <ki...@gmail.com> on 2019/03/24 20:30:46 UTC, 0 replies.
- Upcoming talks on BigDL and Analytics Zoo this week - posted by Jason Dai <ja...@gmail.com> on 2019/03/25 04:03:28 UTC, 0 replies.
- Window function range between - posted by Kumar sp <kr...@gmail.com> on 2019/03/25 18:35:53 UTC, 1 replies.
- Understanding State Store storage behavior for the Stream Deduplication function - posted by Gerard Maas <ge...@gmail.com> on 2019/03/25 19:17:27 UTC, 2 replies.
- streaming - absolute maximum - posted by Jason Nerothin <ja...@gmail.com> on 2019/03/26 00:04:44 UTC, 0 replies.
- RPC timeout error for AES based encryption between driver and executor - posted by "Sinha, Breeta (Nokia - IN/Bangalore)" <br...@nokia.com> on 2019/03/26 09:29:07 UTC, 2 replies.
- Spark Thrift Server 2.2.1 - posted by Tomasz Krol <pa...@gmail.com> on 2019/03/26 11:51:50 UTC, 0 replies.
- Spark Profiler - posted by Jack Kolokasis <ko...@ics.forth.gr> on 2019/03/26 12:59:38 UTC, 6 replies.
- spark.submit.deployMode: cluster - posted by Pat Ferrel <pa...@occamsmachete.com> on 2019/03/26 20:56:34 UTC, 8 replies.
- SortMerge Join on partitioned column causes shuffle - posted by lsn24 <le...@gmail.com> on 2019/03/27 00:29:22 UTC, 0 replies.
- Spark Kafka Batch Write guarantees - posted by hemant singh <he...@gmail.com> on 2019/03/27 08:15:23 UTC, 0 replies.
- Parquet File Output Sink - Spark Structured Streaming - posted by Matt Kuiper <ma...@polarisalpha.com> on 2019/03/27 15:45:15 UTC, 0 replies.
- Re: Parquet File Output Sink - Spark Structured Streaming - posted by Gabor Somogyi <ga...@gmail.com> on 2019/03/27 16:20:18 UTC, 1 replies.
- Streaming data out of spark to a Kafka topic - posted by Mich Talebzadeh <mi...@gmail.com> on 2019/03/27 19:47:38 UTC, 2 replies.
- Spark migration to Kubernetes - posted by thrisha <ts...@threatmetrix.com> on 2019/03/27 22:52:46 UTC, 0 replies.
- Fwd: How Spark coordinates multi contender race on writing zookeeper? (Also on stackoverflow) - posted by Zili Chen <wa...@gmail.com> on 2019/03/27 23:42:14 UTC, 1 replies.
- How to extract data in parallel from RDBMS tables - posted by "Surendra , Manchikanti" <su...@gmail.com> on 2019/03/28 04:06:00 UTC, 3 replies.
- Udfs in spark - posted by Achilleus 003 <ac...@gmail.com> on 2019/03/28 04:45:25 UTC, 0 replies.
- Adaptive query execution and CBO - posted by Tomasz Krol <pa...@gmail.com> on 2019/03/28 09:52:30 UTC, 0 replies.
- BLAS library class def not found error - posted by Serena S Yuan <su...@gmail.com> on 2019/03/28 23:10:32 UTC, 0 replies.
- Dataset schema incompatibility bug when reading column partitioned data - posted by Dávid Szakállas <da...@gmail.com> on 2019/03/29 13:15:27 UTC, 0 replies.
- Spark SQL API taking longer time than DF API. - posted by neeraj bhadani <bh...@gmail.com> on 2019/03/29 14:10:22 UTC, 3 replies.
- spark generates corrupted parquet files - posted by Lian Jiang <ji...@gmail.com> on 2019/03/29 17:23:27 UTC, 0 replies.
- ClassCastException for SerializedLamba - posted by Koert Kuipers <ko...@tresata.com> on 2019/03/29 20:01:52 UTC, 1 replies.