You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Paul Wu (JIRA)" <ji...@apache.org> on 2015/05/21 23:56:18 UTC
[jira] [Updated] (SPARK-7804) Incorrect results from JDBCRDD -- one record repeatly

     [ https://issues.apache.org/jira/browse/SPARK-7804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Wu updated SPARK-7804:
---------------------------
    Description: 
Getting only one  record repeated in the RDD and repeated field value:
 
I have a table like:
attuid  name email
12  john   john@appp.com
23  tom   tom@appp.com
34  tony  tony@appp.com

My code:

 JavaSparkContext sc = new JavaSparkContext(sparkConf);

        String url = "....";

        java.util.Properties prop = new Properties();

        List<JDBCPartition> partitionList = new ArrayList<>();

        //int i;

        partitionList.add(new JDBCPartition("1=1", 0));

        
        List<StructField> fields = new ArrayList<StructField>();
        fields.add(DataTypes.createStructField("attuid", DataTypes.StringType, true));
        fields.add(DataTypes.createStructField("name", DataTypes.StringType, true));
        fields.add(DataTypes.createStructField("email", DataTypes.StringType, true));
        StructType schema = DataTypes.createStructType(fields);
        JDBCRDD jdbcRDD = new JDBCRDD(sc.sc(),
                JDBCRDD.getConnector("oracle.jdbc.OracleDriver", url, prop),
                 
                schema,
                " USERS",
                new String[]{"attuid", "name", "email"},
                new Filter[]{ },
                
                partitionList.toArray(new JDBCPartition[0])
      
        );

    
        System.out.println("count before to Java RDD=" + jdbcRDD.cache().count());
        JavaRDD<Row> jrdd = jdbcRDD.toJavaRDD();
        System.out.println("count=" + jrdd.count());
        List<Row> lr = jrdd.collect();
        for (Row r : lr) {
            for (int ii = 0; ii < r.length(); ii++) {
                System.out.println(r.getString(ii));
            }
        }
===========================
result is :
34
tony
 tony@appp.com
34
tony
 tony@appp.com
34
tony 
 tony@appp.com


  was:
Getting only one  record repeated in the RDD and repeated field value:
 
I have a table like:
attuid  name email
12  john   john@appp.com
23  tom   tom@appp.com
34  tony  tony@appp.com

My code:

 JavaSparkContext sc = new JavaSparkContext(sparkConf);

        String url = "....";

        java.util.Properties prop = new Properties();

        List<JDBCPartition> partitionList = new ArrayList<>();

        //int i;

        partitionList.add(new JDBCPartition("1=1", 0));

        
        List<StructField> fields = new ArrayList<StructField>();
        fields.add(DataTypes.createStructField("attuid", DataTypes.StringType, true));
        fields.add(DataTypes.createStructField("name", DataTypes.StringType, true));
        fields.add(DataTypes.createStructField("email", DataTypes.StringType, true));
        StructType schema = DataTypes.createStructType(fields);
        JDBCRDD jdbcRDD = new JDBCRDD(sc.sc(),
                JDBCRDD.getConnector("oracle.jdbc.OracleDriver", url, prop),
                 
                schema,
                " USERS",
                new String[]{"attuid", "name", "email"},
                new Filter[]{ },
                
                partitionList.toArray(new JDBCPartition[0])
      
        );

    
        System.out.println("count before to Java RDD=" + jdbcRDD.cache().count());
        JavaRDD<Row> jrdd = jdbcRDD.toJavaRDD();
        System.out.println("count=" + jrdd.count());
        List<Row> lr = jrdd.collect();
        for (Row r : lr) {
            for (int ii = 0; ii < r.length(); ii++) {
                System.out.println(r.getString(ii));
            }
        }
===========================
result is :
34
34 
 tony@appp.com
34
34 
 tony@appp.com
34
34 
 tony@appp.com


        Summary: Incorrect results from JDBCRDD -- one record repeatly  (was: Incorrect results from JDBCRDD -- one record repeatly and incorrect field value )

> Incorrect results from JDBCRDD -- one record repeatly
> -----------------------------------------------------
>
>                 Key: SPARK-7804
>                 URL: https://issues.apache.org/jira/browse/SPARK-7804
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.3.0, 1.3.1
>            Reporter: Paul Wu
>              Labels: JDBCRDD, sql
>
> Getting only one  record repeated in the RDD and repeated field value:
>  
> I have a table like:
> attuid  name email
> 12  john   john@appp.com
> 23  tom   tom@appp.com
> 34  tony  tony@appp.com
> My code:
>  JavaSparkContext sc = new JavaSparkContext(sparkConf);
>         String url = "....";
>         java.util.Properties prop = new Properties();
>         List<JDBCPartition> partitionList = new ArrayList<>();
>         //int i;
>         partitionList.add(new JDBCPartition("1=1", 0));
>         
>         List<StructField> fields = new ArrayList<StructField>();
>         fields.add(DataTypes.createStructField("attuid", DataTypes.StringType, true));
>         fields.add(DataTypes.createStructField("name", DataTypes.StringType, true));
>         fields.add(DataTypes.createStructField("email", DataTypes.StringType, true));
>         StructType schema = DataTypes.createStructType(fields);
>         JDBCRDD jdbcRDD = new JDBCRDD(sc.sc(),
>                 JDBCRDD.getConnector("oracle.jdbc.OracleDriver", url, prop),
>                  
>                 schema,
>                 " USERS",
>                 new String[]{"attuid", "name", "email"},
>                 new Filter[]{ },
>                 
>                 partitionList.toArray(new JDBCPartition[0])
>       
>         );
>     
>         System.out.println("count before to Java RDD=" + jdbcRDD.cache().count());
>         JavaRDD<Row> jrdd = jdbcRDD.toJavaRDD();
>         System.out.println("count=" + jrdd.count());
>         List<Row> lr = jrdd.collect();
>         for (Row r : lr) {
>             for (int ii = 0; ii < r.length(); ii++) {
>                 System.out.println(r.getString(ii));
>             }
>         }
> ===========================
> result is :
> 34
> tony
>  tony@appp.com
> 34
> tony
>  tony@appp.com
> 34
> tony 
>  tony@appp.com



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org