You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Paul Wu (JIRA)" <ji...@apache.org> on 2015/05/21 23:56:18 UTC
[jira] [Updated] (SPARK-7804) Incorrect results from JDBCRDD -- one
record repeatly
[ https://issues.apache.org/jira/browse/SPARK-7804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Wu updated SPARK-7804:
---------------------------
Description:
Getting only one record repeated in the RDD and repeated field value:
I have a table like:
attuid name email
12 john john@appp.com
23 tom tom@appp.com
34 tony tony@appp.com
My code:
JavaSparkContext sc = new JavaSparkContext(sparkConf);
String url = "....";
java.util.Properties prop = new Properties();
List<JDBCPartition> partitionList = new ArrayList<>();
//int i;
partitionList.add(new JDBCPartition("1=1", 0));
List<StructField> fields = new ArrayList<StructField>();
fields.add(DataTypes.createStructField("attuid", DataTypes.StringType, true));
fields.add(DataTypes.createStructField("name", DataTypes.StringType, true));
fields.add(DataTypes.createStructField("email", DataTypes.StringType, true));
StructType schema = DataTypes.createStructType(fields);
JDBCRDD jdbcRDD = new JDBCRDD(sc.sc(),
JDBCRDD.getConnector("oracle.jdbc.OracleDriver", url, prop),
schema,
" USERS",
new String[]{"attuid", "name", "email"},
new Filter[]{ },
partitionList.toArray(new JDBCPartition[0])
);
System.out.println("count before to Java RDD=" + jdbcRDD.cache().count());
JavaRDD<Row> jrdd = jdbcRDD.toJavaRDD();
System.out.println("count=" + jrdd.count());
List<Row> lr = jrdd.collect();
for (Row r : lr) {
for (int ii = 0; ii < r.length(); ii++) {
System.out.println(r.getString(ii));
}
}
===========================
result is :
34
tony
tony@appp.com
34
tony
tony@appp.com
34
tony
tony@appp.com
was:
Getting only one record repeated in the RDD and repeated field value:
I have a table like:
attuid name email
12 john john@appp.com
23 tom tom@appp.com
34 tony tony@appp.com
My code:
JavaSparkContext sc = new JavaSparkContext(sparkConf);
String url = "....";
java.util.Properties prop = new Properties();
List<JDBCPartition> partitionList = new ArrayList<>();
//int i;
partitionList.add(new JDBCPartition("1=1", 0));
List<StructField> fields = new ArrayList<StructField>();
fields.add(DataTypes.createStructField("attuid", DataTypes.StringType, true));
fields.add(DataTypes.createStructField("name", DataTypes.StringType, true));
fields.add(DataTypes.createStructField("email", DataTypes.StringType, true));
StructType schema = DataTypes.createStructType(fields);
JDBCRDD jdbcRDD = new JDBCRDD(sc.sc(),
JDBCRDD.getConnector("oracle.jdbc.OracleDriver", url, prop),
schema,
" USERS",
new String[]{"attuid", "name", "email"},
new Filter[]{ },
partitionList.toArray(new JDBCPartition[0])
);
System.out.println("count before to Java RDD=" + jdbcRDD.cache().count());
JavaRDD<Row> jrdd = jdbcRDD.toJavaRDD();
System.out.println("count=" + jrdd.count());
List<Row> lr = jrdd.collect();
for (Row r : lr) {
for (int ii = 0; ii < r.length(); ii++) {
System.out.println(r.getString(ii));
}
}
===========================
result is :
34
34
tony@appp.com
34
34
tony@appp.com
34
34
tony@appp.com
Summary: Incorrect results from JDBCRDD -- one record repeatly (was: Incorrect results from JDBCRDD -- one record repeatly and incorrect field value )
> Incorrect results from JDBCRDD -- one record repeatly
> -----------------------------------------------------
>
> Key: SPARK-7804
> URL: https://issues.apache.org/jira/browse/SPARK-7804
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.3.0, 1.3.1
> Reporter: Paul Wu
> Labels: JDBCRDD, sql
>
> Getting only one record repeated in the RDD and repeated field value:
>
> I have a table like:
> attuid name email
> 12 john john@appp.com
> 23 tom tom@appp.com
> 34 tony tony@appp.com
> My code:
> JavaSparkContext sc = new JavaSparkContext(sparkConf);
> String url = "....";
> java.util.Properties prop = new Properties();
> List<JDBCPartition> partitionList = new ArrayList<>();
> //int i;
> partitionList.add(new JDBCPartition("1=1", 0));
>
> List<StructField> fields = new ArrayList<StructField>();
> fields.add(DataTypes.createStructField("attuid", DataTypes.StringType, true));
> fields.add(DataTypes.createStructField("name", DataTypes.StringType, true));
> fields.add(DataTypes.createStructField("email", DataTypes.StringType, true));
> StructType schema = DataTypes.createStructType(fields);
> JDBCRDD jdbcRDD = new JDBCRDD(sc.sc(),
> JDBCRDD.getConnector("oracle.jdbc.OracleDriver", url, prop),
>
> schema,
> " USERS",
> new String[]{"attuid", "name", "email"},
> new Filter[]{ },
>
> partitionList.toArray(new JDBCPartition[0])
>
> );
>
> System.out.println("count before to Java RDD=" + jdbcRDD.cache().count());
> JavaRDD<Row> jrdd = jdbcRDD.toJavaRDD();
> System.out.println("count=" + jrdd.count());
> List<Row> lr = jrdd.collect();
> for (Row r : lr) {
> for (int ii = 0; ii < r.length(); ii++) {
> System.out.println(r.getString(ii));
> }
> }
> ===========================
> result is :
> 34
> tony
> tony@appp.com
> 34
> tony
> tony@appp.com
> 34
> tony
> tony@appp.com
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org