You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@kudu.apache.org by "Andrew Ya (JIRA)" <ji...@apache.org> on 2017/11/07 14:07:00 UTC

[jira] [Updated] (KUDU-2210) Apache Spark stucks while reading Kudu table.

     [ https://issues.apache.org/jira/browse/KUDU-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Ya updated KUDU-2210:
----------------------------
    Component/s: perf
                 client

> Apache Spark stucks while reading Kudu table.
> ---------------------------------------------
>
>                 Key: KUDU-2210
>                 URL: https://issues.apache.org/jira/browse/KUDU-2210
>             Project: Kudu
>          Issue Type: Bug
>          Components: client, perf, spark
>            Reporter: Andrew Ya
>
> When I try reading Kudu table with Apache Spark using following code
> {code}
> import org.apache.kudu.spark.kudu._
> import sqlContext.implicits._
> val kuduOptions: Map[String, String] = Map(
> "kudu.table"  -> "test_table", 
> "kudu.master" -> "host1:7051,host2:7051,host3:7051")
> val kuduDF = sqlContext.read.options(kuduOptions).kudu
> kuduDF.registerTempTable("t")
> sqlContext.sql(" SELECT * FROM t  where id in (1111,2222) ").show(50, false)
> {code}
> after completing 95% of tasks the job stucks for more than three days.  The table is partitioned by date and partitions have uneven size. Table have one partition 12 Gb size, about 20 partitions with size between 1 Gb and 3 Gb and some partitions with Mb's and kb's of data.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)