You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "zhuo bao (JIRA)" <ji...@apache.org> on 2017/03/05 07:11:32 UTC
[jira] [Commented] (SPARK-13156) JDBC using multiple partitions
creates additional tasks but only executes on one
[ https://issues.apache.org/jira/browse/SPARK-13156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15896100#comment-15896100 ]
zhuo bao commented on SPARK-13156:
----------------------------------
I had the same problem, but I found that it is the problem with
val upperBound = (numPartitions-1).toLong
Since upperBound should 'EXCLUSIVE'. So the value here should be numPartition.toLong. They will find all thread/partitions are populated with records.
Attention, please verify your record number at the end.
> JDBC using multiple partitions creates additional tasks but only executes on one
> --------------------------------------------------------------------------------
>
> Key: SPARK-13156
> URL: https://issues.apache.org/jira/browse/SPARK-13156
> Project: Spark
> Issue Type: Bug
> Components: Input/Output
> Affects Versions: 1.5.0
> Environment: Hadoop 2.6.0-cdh5.4.0, Teradata, yarn-client
> Reporter: Charles Drotar
>
> I can successfully kick off a query through JDBC to Teradata, and when it runs it creates a task on each executor for every partition. The problem is that all of the tasks except for one complete within a couple seconds and the final task handles the entire dataset.
> Example Code:
> private val properties = new java.util.Properties()
> properties.setProperty("driver","com.teradata.jdbc.TeraDriver")
> properties.setProperty("username","foo")
> properties.setProperty("password","bar")
> val url = "jdbc:teradata://oneview/, TMODE=TERA,TYPE=FASTEXPORT,SESSIONS=10"
> val numPartitions = 5
> val dbTableTemp = "( SELECT id MOD $numPartitions%d AS modulo, id FROM db.table) AS TEMP_TABLE"
> val partitionColumn = "modulo"
> val lowerBound = 0.toLong
> val upperBound = (numPartitions-1).toLong
> val df = sqlContext.read.jdbc(url,dbTableTemp,partitionColumn,lowerBound,upperBound,numPartitions,properties)
> df.write.parquet("/output/path/for/df/")
> When I look at the Spark UI I see the 5 tasks, but only 1 is actually querying.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org