You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "zhuo bao (JIRA)" <ji...@apache.org> on 2017/03/05 07:11:32 UTC

[jira] [Commented] (SPARK-13156) JDBC using multiple partitions creates additional tasks but only executes on one

    [ https://issues.apache.org/jira/browse/SPARK-13156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15896100#comment-15896100 ] 

zhuo bao commented on SPARK-13156:
----------------------------------

I had the same problem, but I found that it is the problem with 

val upperBound = (numPartitions-1).toLong

Since upperBound should 'EXCLUSIVE'. So the value here should be numPartition.toLong. They will find all thread/partitions are populated with records. 
Attention, please verify your record number at the end.

> JDBC using multiple partitions creates additional tasks but only executes on one
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-13156
>                 URL: https://issues.apache.org/jira/browse/SPARK-13156
>             Project: Spark
>          Issue Type: Bug
>          Components: Input/Output
>    Affects Versions: 1.5.0
>         Environment: Hadoop 2.6.0-cdh5.4.0, Teradata, yarn-client
>            Reporter: Charles Drotar
>
> I can successfully kick off a query through JDBC to Teradata, and when it runs it creates a task on each executor for every partition. The problem is that all of the tasks except for one complete within a couple seconds and the final task handles the entire dataset.
> Example Code:
> private val properties = new java.util.Properties()
> properties.setProperty("driver","com.teradata.jdbc.TeraDriver")
> properties.setProperty("username","foo")
> properties.setProperty("password","bar")
> val url = "jdbc:teradata://oneview/, TMODE=TERA,TYPE=FASTEXPORT,SESSIONS=10"
> val numPartitions = 5
> val dbTableTemp = "( SELECT  id MOD $numPartitions%d AS modulo, id FROM db.table) AS TEMP_TABLE"
> val partitionColumn = "modulo"
> val lowerBound = 0.toLong
> val upperBound = (numPartitions-1).toLong
> val df = sqlContext.read.jdbc(url,dbTableTemp,partitionColumn,lowerBound,upperBound,numPartitions,properties)
> df.write.parquet("/output/path/for/df/")
> When I look at the Spark UI I see the 5 tasks, but only 1 is actually querying.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org