You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Charles Drotar (JIRA)" <ji...@apache.org> on 2016/02/08 12:28:39 UTC

[jira] [Closed] (SPARK-13156) JDBC using multiple partitions creates additional tasks but only executes on one

     [ https://issues.apache.org/jira/browse/SPARK-13156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Charles Drotar closed SPARK-13156.
----------------------------------
    Resolution: Not A Problem

The driver class was inhibiting concurrent connections. This was unrelated to Spark's jdbc functionality.

> JDBC using multiple partitions creates additional tasks but only executes on one
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-13156
>                 URL: https://issues.apache.org/jira/browse/SPARK-13156
>             Project: Spark
>          Issue Type: Bug
>          Components: Input/Output
>    Affects Versions: 1.5.0
>         Environment: Hadoop 2.6.0-cdh5.4.0, Teradata, yarn-client
>            Reporter: Charles Drotar
>
> I can successfully kick off a query through JDBC to Teradata, and when it runs it creates a task on each executor for every partition. The problem is that all of the tasks except for one complete within a couple seconds and the final task handles the entire dataset.
> Example Code:
> private val properties = new java.util.Properties()
> properties.setProperty("driver","com.teradata.jdbc.TeraDriver")
> properties.setProperty("username","foo")
> properties.setProperty("password","bar")
> val url = "jdbc:teradata://oneview/, TMODE=TERA,TYPE=FASTEXPORT,SESSIONS=10"
> val numPartitions = 5
> val dbTableTemp = "( SELECT  id MOD $numPartitions%d AS modulo, id FROM db.table) AS TEMP_TABLE"
> val partitionColumn = "modulo"
> val lowerBound = 0.toLong
> val upperBound = (numPartitions-1).toLong
> val df = sqlContext.read.jdbc(url,dbTableTemp,partitionColumn,lowerBound,upperBound,numPartitions,properties)
> df.write.parquet("/output/path/for/df/")
> When I look at the Spark UI I see the 5 tasks, but only 1 is actually querying.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org