You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Nipun Batra <bn...@gmail.com> on 2015/04/17 21:52:03 UTC

BUG: 1.3.0 org.apache.spark.sql.Row Does not exist in Java API

Hi

The example given in SQL document
https://spark.apache.org/docs/latest/sql-programming-guide.html

org.apache.spark.sql.Row Does not exist in Java API or atleast I was not
able to find it.

Build Info - Downloaded from spark website

Dependency
                <dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.3.0</version>
<scope>provided</scope>
</dependency>

Code in documentation

// Import factory methods provided by DataType.import
org.apache.spark.sql.types.DataType;// Import StructType and
StructFieldimport org.apache.spark.sql.types.StructType;import
org.apache.spark.sql.types.StructField;// Import Row.import
org.apache.spark.sql.Row;
// sc is an existing JavaSparkContext.SQLContext sqlContext = new
org.apache.spark.sql.SQLContext(sc);
// Load a text file and convert each line to a
JavaBean.JavaRDD<String> people =
sc.textFile("examples/src/main/resources/people.txt");
// The schema is encoded in a stringString schemaString = "name age";
// Generate the schema based on the string of schemaList<StructField>
fields = new ArrayList<StructField>();for (String fieldName:
schemaString.split(" ")) {
  fields.add(DataType.createStructField(fieldName,
DataType.StringType, true));}StructType schema =
DataType.createStructType(fields);
// Convert records of the RDD (people) to Rows.JavaRDD<Row> rowRDD = people.map(
  new Function<String, Row>() {
    public Row call(String record) throws Exception {
      String[] fields = record.split(",");
      return Row.create(fields[0], fields[1].trim());
    }
  });
// Apply the schema to the RDD.DataFrame peopleDataFrame =
sqlContext.createDataFrame(rowRDD, schema);
// Register the DataFrame as a
table.peopleDataFrame.registerTempTable("people");
// SQL can be run over RDDs that have been registered as
tables.DataFrame results = sqlContext.sql("SELECT name FROM people");
// The results of SQL queries are DataFrames and support all the
normal RDD operations.// The columns of a row in the result can be
accessed by ordinal.List<String> names = results.map(new Function<Row,
String>() {
  public String call(Row row) {
    return "Name: " + row.getString(0);
  }

}).collect();


Thanks
Nipun

Re: BUG: 1.3.0 org.apache.spark.sql.Row Does not exist in Java API

Posted by Olivier Girardot <ss...@gmail.com>.

Hi Nipun,
you're right, I created the pull request fixing the documentation:
https://github.com/apache/spark/pull/5569
and the corresponding issue:
https://issues.apache.org/jira/browse/SPARK-6992
Thank you for your time,

Olivier.

Le sam. 18 avr. 2015 à 01:11, Nipun Batra <ba...@gmail.com> a écrit :

> Hi Oliver
>
> Thank you for responding.
>
> I am able to find org.apache.spark.sql.Row in spark-catalyst_2.10-1.3.0,
> BUT it was not visible in API document yesterday (
> https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/package-frame.html).
> I am pretty sure.
>
> Also I think this document needs to be changed '
> https://spark.apache.org/docs/latest/sql-programming-guide.html'
>
> return Row.create(fields[0], fields[1].trim());
>
>
> needs to be replaced with RowFactory.create.
>
> Thanks again for your reponse.
>
> Thanks
> Nipun Batra
>
>
>
> On Fri, Apr 17, 2015 at 2:50 PM, Olivier Girardot <ss...@gmail.com>
> wrote:
>
>> Hi Nipun,
>> I'm sorry but I don't understand exactly what your problem is ?
>> Regarding the org.apache.spark.sql.Row, it does exists in the Spark SQL
>> dependency.
>> Is it a compilation problem ?
>> Are you trying to run a main method using the pom you've just described ?
>> or are you trying to spark-submit the jar ?
>> If you're trying to run a main method, the scope provided is not designed
>> for that and will make your program fail.
>>
>> Regards,
>>
>> Olivier.
>>
>> Le ven. 17 avr. 2015 à 21:52, Nipun Batra <bn...@gmail.com> a écrit :
>>
>>> Hi
>>>
>>> The example given in SQL document
>>> https://spark.apache.org/docs/latest/sql-programming-guide.html
>>>
>>> org.apache.spark.sql.Row Does not exist in Java API or atleast I was not
>>> able to find it.
>>>
>>> Build Info - Downloaded from spark website
>>>
>>> Dependency
>>>                 <dependency>
>>> <groupId>org.apache.spark</groupId>
>>> <artifactId>spark-sql_2.10</artifactId>
>>> <version>1.3.0</version>
>>> <scope>provided</scope>
>>> </dependency>
>>>
>>> Code in documentation
>>>
>>> // Import factory methods provided by DataType.import
>>> org.apache.spark.sql.types.DataType;// Import StructType and
>>> StructFieldimport org.apache.spark.sql.types.StructType;import
>>> org.apache.spark.sql.types.StructField;// Import Row.import
>>> org.apache.spark.sql.Row;
>>> // sc is an existing JavaSparkContext.SQLContext sqlContext = new
>>> org.apache.spark.sql.SQLContext(sc);
>>> // Load a text file and convert each line to a
>>> JavaBean.JavaRDD<String> people =
>>> sc.textFile("examples/src/main/resources/people.txt");
>>> // The schema is encoded in a stringString schemaString = "name age";
>>> // Generate the schema based on the string of schemaList<StructField>
>>> fields = new ArrayList<StructField>();for (String fieldName:
>>> schemaString.split(" ")) {
>>>   fields.add(DataType.createStructField(fieldName,
>>> DataType.StringType, true));}StructType schema =
>>> DataType.createStructType(fields);
>>> // Convert records of the RDD (people) to Rows.JavaRDD<Row> rowRDD =
>>> people.map(
>>>   new Function<String, Row>() {
>>>     public Row call(String record) throws Exception {
>>>       String[] fields = record.split(",");
>>>       return Row.create(fields[0], fields[1].trim());
>>>     }
>>>   });
>>> // Apply the schema to the RDD.DataFrame peopleDataFrame =
>>> sqlContext.createDataFrame(rowRDD, schema);
>>> // Register the DataFrame as a
>>> table.peopleDataFrame.registerTempTable("people");
>>> // SQL can be run over RDDs that have been registered as
>>> tables.DataFrame results = sqlContext.sql("SELECT name FROM people");
>>> // The results of SQL queries are DataFrames and support all the
>>> normal RDD operations.// The columns of a row in the result can be
>>> accessed by ordinal.List<String> names = results.map(new Function<Row,
>>> String>() {
>>>   public String call(Row row) {
>>>     return "Name: " + row.getString(0);
>>>   }
>>>
>>> }).collect();
>>>
>>>
>>> Thanks
>>> Nipun
>>>
>>
>

Re: BUG: 1.3.0 org.apache.spark.sql.Row Does not exist in Java API

Posted by Olivier Girardot <ss...@gmail.com>.

Hi Nipun,
I'm sorry but I don't understand exactly what your problem is ?
Regarding the org.apache.spark.sql.Row, it does exists in the Spark SQL
dependency.
Is it a compilation problem ?
Are you trying to run a main method using the pom you've just described ?
or are you trying to spark-submit the jar ?
If you're trying to run a main method, the scope provided is not designed
for that and will make your program fail.

Regards,

Olivier.

Le ven. 17 avr. 2015 à 21:52, Nipun Batra <bn...@gmail.com> a écrit :

> Hi
>
> The example given in SQL document
> https://spark.apache.org/docs/latest/sql-programming-guide.html
>
> org.apache.spark.sql.Row Does not exist in Java API or atleast I was not
> able to find it.
>
> Build Info - Downloaded from spark website
>
> Dependency
>                 <dependency>
> <groupId>org.apache.spark</groupId>
> <artifactId>spark-sql_2.10</artifactId>
> <version>1.3.0</version>
> <scope>provided</scope>
> </dependency>
>
> Code in documentation
>
> // Import factory methods provided by DataType.import
> org.apache.spark.sql.types.DataType;// Import StructType and
> StructFieldimport org.apache.spark.sql.types.StructType;import
> org.apache.spark.sql.types.StructField;// Import Row.import
> org.apache.spark.sql.Row;
> // sc is an existing JavaSparkContext.SQLContext sqlContext = new
> org.apache.spark.sql.SQLContext(sc);
> // Load a text file and convert each line to a
> JavaBean.JavaRDD<String> people =
> sc.textFile("examples/src/main/resources/people.txt");
> // The schema is encoded in a stringString schemaString = "name age";
> // Generate the schema based on the string of schemaList<StructField>
> fields = new ArrayList<StructField>();for (String fieldName:
> schemaString.split(" ")) {
>   fields.add(DataType.createStructField(fieldName,
> DataType.StringType, true));}StructType schema =
> DataType.createStructType(fields);
> // Convert records of the RDD (people) to Rows.JavaRDD<Row> rowRDD =
> people.map(
>   new Function<String, Row>() {
>     public Row call(String record) throws Exception {
>       String[] fields = record.split(",");
>       return Row.create(fields[0], fields[1].trim());
>     }
>   });
> // Apply the schema to the RDD.DataFrame peopleDataFrame =
> sqlContext.createDataFrame(rowRDD, schema);
> // Register the DataFrame as a
> table.peopleDataFrame.registerTempTable("people");
> // SQL can be run over RDDs that have been registered as
> tables.DataFrame results = sqlContext.sql("SELECT name FROM people");
> // The results of SQL queries are DataFrames and support all the
> normal RDD operations.// The columns of a row in the result can be
> accessed by ordinal.List<String> names = results.map(new Function<Row,
> String>() {
>   public String call(Row row) {
>     return "Name: " + row.getString(0);
>   }
>
> }).collect();
>
>
> Thanks
> Nipun
>