You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by guxiaobo1982 <gu...@qq.com> on 2014/06/21 16:52:22 UTC

What about a general schema registration method for JavaSchemaRDD?

Hi ,
 
The current implementation of JavaSchemaRDD need a special JavaBean class to define schema information for tables, but when developing applications using the Spark SQL API, table is a more dynamic component, the awkward thing is when new tables are defined, we must create a new JavaBean, and redeploy the whole application. So here comes an idea regarding a more general schema registration method,
 
 
 
Step1: Defile a new Java class named RowSchema in API to define column information, column name and data type are most important ones.
 
 
 
Step2: the actual data is store just as JavaRDD<Row>;
 
 
 
Step3:When loading data into JavaRDD<Row>, the API provides a general map function, which takes a RowSchema object as parameter, to map each line to a Row object.
 
 
 
Step4: Add a new applySchema method , which takes a RowSchema object as parameter, to the JavaSQLContext class ,
 
 
 
Step 5: The registerAsTable and all other SQL releated methods of JavaSQLContext class should take care of the difference of defining schema throw JavaBean and RowSchemas.(That’s the work of the API layer)
 
 
 
The API is something like this:
 
 
 
Public Class RowSchema{
 
Public RowSchema(List<String> colNames, List<String> colDataTypes);
 
Public String getColName(integer i);//return column string of column I;
 
Public integer getColDataType(integer i);//return data type of column I;
 
Public integer getColNumber();// return number of columns 
 
 
 
};
 
RowSchema rs = new RowSchema(……);
 
 
 
JavaRDD<Row> table = ctx.textFile(“file path”).map(rs);
 
 
 
JavaSchemaRDD schemaPeople = sqlCtx.applySchema(table, rs);
 
schemaPeople.registerAsTable("people");
 
Regards,
 

 Xiaobo Gu

Re: What about a general schema registration method for JavaSchemaRDD?

Posted by Reynold Xin <rx...@databricks.com>.
Thanks for the message.


There is an open issue about the public type / schema system that is
related to this topic: https://issues.apache.org/jira/browse/SPARK-2179

You probably want to comment on that ticket as well.



On Sat, Jun 21, 2014 at 7:52 AM, guxiaobo1982 <gu...@qq.com> wrote:

> Hi ,
>
> The current implementation of JavaSchemaRDD need a special JavaBean class
> to define schema information for tables, but when developing applications
> using the Spark SQL API, table is a more dynamic component, the awkward
> thing is when new tables are defined, we must create a new JavaBean, and
> redeploy the whole application. So here comes an idea regarding a more
> general schema registration method,
>
>
>
> Step1: Defile a new Java class named RowSchema in API to define column
> information, column name and data type are most important ones.
>
>
>
> Step2: the actual data is store just as JavaRDD<Row>;
>
>
>
> Step3:When loading data into JavaRDD<Row>, the API provides a general map
> function, which takes a RowSchema object as parameter, to map each line to
> a Row object.
>
>
>
> Step4: Add a new applySchema method , which takes a RowSchema object as
> parameter, to the JavaSQLContext class ,
>
>
>
> Step 5: The registerAsTable and all other SQL releated methods of
> JavaSQLContext class should take care of the difference of defining schema
> throw JavaBean and RowSchemas.(That’s the work of the API layer)
>
>
>
> The API is something like this:
>
>
>
> Public Class RowSchema{
>
> Public RowSchema(List<String> colNames, List<String> colDataTypes);
>
> Public String getColName(integer i);//return column string of column I;
>
> Public integer getColDataType(integer i);//return data type of column I;
>
> Public integer getColNumber();// return number of columns
>
>
>
> };
>
> RowSchema rs = new RowSchema(……);
>
>
>
> JavaRDD<Row> table = ctx.textFile(“file path”).map(rs);
>
>
>
> JavaSchemaRDD schemaPeople = sqlCtx.applySchema(table, rs);
>
> schemaPeople.registerAsTable("people");
>
> Regards,
>
>
>  Xiaobo Gu