You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Niranda Perera <ni...@wso2.com> on 2014/11/28 07:31:13 UTC

Creating a SchemaRDD from an existing API

Hi,

I am evaluating Spark for an analytic component where we do batch
processing of data using SQL.

So, I am particularly interested in Spark SQL and in creating a SchemaRDD
from an existing API [1].

This API exposes elements in a database as datasources. Using the methods
allowed by this data source, we can access and edit data.

So, I want to create a custom SchemaRDD using the methods and provisions of
this API. I tried going through Spark documentation and the Java Docs, but
unfortunately, I was unable to come to a final conclusion if this was
actually possible.

I would like to ask the Spark Devs,
1. As of the current Spark release, can we make a custom SchemaRDD?
2. What is the extension point to a custom SchemaRDD? or are there
particular interfaces?
3. Could you please point me the specific docs regarding this matter?

Your help in this regard is highly appreciated.

Cheers

[1]
https://github.com/wso2-dev/carbon-analytics/tree/master/components/xanalytics

-- 
*Niranda Perera*
Software Engineer, WSO2 Inc.
Mobile: +94-71-554-8430
Twitter: @n1r44 <https://twitter.com/N1R44>

Re: Creating a SchemaRDD from an existing API

Posted by Michael Armbrust <mi...@databricks.com>.
No, it should support any data source that has a schema and can produce
rows.

On Mon, Dec 1, 2014 at 1:34 AM, Niranda Perera <ni...@wso2.com> wrote:

> Hi Michael,
>
> About this new data source API, what type of data sources would it
> support? Does it have to be RDBMS necessarily?
>
> Cheers
>
> On Sat, Nov 29, 2014 at 12:57 AM, Michael Armbrust <michael@databricks.com
> > wrote:
>
>> You probably don't need to create a new kind of SchemaRDD.  Instead I'd
>> suggest taking a look at the data sources API that we are adding in Spark
>> 1.2.  There is not a ton of documentation, but the test cases show how
>> to implement the various interfaces
>> <https://github.com/apache/spark/tree/master/sql/core/src/test/scala/org/apache/spark/sql/sources>,
>> and there is an example library for reading Avro data
>> <https://github.com/databricks/spark-avro>.
>>
>> On Thu, Nov 27, 2014 at 10:31 PM, Niranda Perera <ni...@wso2.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am evaluating Spark for an analytic component where we do batch
>>> processing of data using SQL.
>>>
>>> So, I am particularly interested in Spark SQL and in creating a SchemaRDD
>>> from an existing API [1].
>>>
>>> This API exposes elements in a database as datasources. Using the methods
>>> allowed by this data source, we can access and edit data.
>>>
>>> So, I want to create a custom SchemaRDD using the methods and provisions
>>> of
>>> this API. I tried going through Spark documentation and the Java Docs,
>>> but
>>> unfortunately, I was unable to come to a final conclusion if this was
>>> actually possible.
>>>
>>> I would like to ask the Spark Devs,
>>> 1. As of the current Spark release, can we make a custom SchemaRDD?
>>> 2. What is the extension point to a custom SchemaRDD? or are there
>>> particular interfaces?
>>> 3. Could you please point me the specific docs regarding this matter?
>>>
>>> Your help in this regard is highly appreciated.
>>>
>>> Cheers
>>>
>>> [1]
>>>
>>> https://github.com/wso2-dev/carbon-analytics/tree/master/components/xanalytics
>>>
>>> --
>>> *Niranda Perera*
>>> Software Engineer, WSO2 Inc.
>>> Mobile: +94-71-554-8430
>>> Twitter: @n1r44 <https://twitter.com/N1R44>
>>>
>>
>>
>
>
> --
> *Niranda Perera*
> Software Engineer, WSO2 Inc.
> Mobile: +94-71-554-8430
> Twitter: @n1r44 <https://twitter.com/N1R44>
>

Re: Creating a SchemaRDD from an existing API

Posted by Michael Armbrust <mi...@databricks.com>.
No, it should support any data source that has a schema and can produce
rows.

On Mon, Dec 1, 2014 at 1:34 AM, Niranda Perera <ni...@wso2.com> wrote:

> Hi Michael,
>
> About this new data source API, what type of data sources would it
> support? Does it have to be RDBMS necessarily?
>
> Cheers
>
> On Sat, Nov 29, 2014 at 12:57 AM, Michael Armbrust <michael@databricks.com
> > wrote:
>
>> You probably don't need to create a new kind of SchemaRDD.  Instead I'd
>> suggest taking a look at the data sources API that we are adding in Spark
>> 1.2.  There is not a ton of documentation, but the test cases show how
>> to implement the various interfaces
>> <https://github.com/apache/spark/tree/master/sql/core/src/test/scala/org/apache/spark/sql/sources>,
>> and there is an example library for reading Avro data
>> <https://github.com/databricks/spark-avro>.
>>
>> On Thu, Nov 27, 2014 at 10:31 PM, Niranda Perera <ni...@wso2.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am evaluating Spark for an analytic component where we do batch
>>> processing of data using SQL.
>>>
>>> So, I am particularly interested in Spark SQL and in creating a SchemaRDD
>>> from an existing API [1].
>>>
>>> This API exposes elements in a database as datasources. Using the methods
>>> allowed by this data source, we can access and edit data.
>>>
>>> So, I want to create a custom SchemaRDD using the methods and provisions
>>> of
>>> this API. I tried going through Spark documentation and the Java Docs,
>>> but
>>> unfortunately, I was unable to come to a final conclusion if this was
>>> actually possible.
>>>
>>> I would like to ask the Spark Devs,
>>> 1. As of the current Spark release, can we make a custom SchemaRDD?
>>> 2. What is the extension point to a custom SchemaRDD? or are there
>>> particular interfaces?
>>> 3. Could you please point me the specific docs regarding this matter?
>>>
>>> Your help in this regard is highly appreciated.
>>>
>>> Cheers
>>>
>>> [1]
>>>
>>> https://github.com/wso2-dev/carbon-analytics/tree/master/components/xanalytics
>>>
>>> --
>>> *Niranda Perera*
>>> Software Engineer, WSO2 Inc.
>>> Mobile: +94-71-554-8430
>>> Twitter: @n1r44 <https://twitter.com/N1R44>
>>>
>>
>>
>
>
> --
> *Niranda Perera*
> Software Engineer, WSO2 Inc.
> Mobile: +94-71-554-8430
> Twitter: @n1r44 <https://twitter.com/N1R44>
>

Re: Creating a SchemaRDD from an existing API

Posted by Niranda Perera <ni...@wso2.com>.
Hi Michael,

About this new data source API, what type of data sources would it support?
Does it have to be RDBMS necessarily?

Cheers

On Sat, Nov 29, 2014 at 12:57 AM, Michael Armbrust <mi...@databricks.com>
wrote:

> You probably don't need to create a new kind of SchemaRDD.  Instead I'd
> suggest taking a look at the data sources API that we are adding in Spark
> 1.2.  There is not a ton of documentation, but the test cases show how to
> implement the various interfaces
> <https://github.com/apache/spark/tree/master/sql/core/src/test/scala/org/apache/spark/sql/sources>,
> and there is an example library for reading Avro data
> <https://github.com/databricks/spark-avro>.
>
> On Thu, Nov 27, 2014 at 10:31 PM, Niranda Perera <ni...@wso2.com> wrote:
>
>> Hi,
>>
>> I am evaluating Spark for an analytic component where we do batch
>> processing of data using SQL.
>>
>> So, I am particularly interested in Spark SQL and in creating a SchemaRDD
>> from an existing API [1].
>>
>> This API exposes elements in a database as datasources. Using the methods
>> allowed by this data source, we can access and edit data.
>>
>> So, I want to create a custom SchemaRDD using the methods and provisions
>> of
>> this API. I tried going through Spark documentation and the Java Docs, but
>> unfortunately, I was unable to come to a final conclusion if this was
>> actually possible.
>>
>> I would like to ask the Spark Devs,
>> 1. As of the current Spark release, can we make a custom SchemaRDD?
>> 2. What is the extension point to a custom SchemaRDD? or are there
>> particular interfaces?
>> 3. Could you please point me the specific docs regarding this matter?
>>
>> Your help in this regard is highly appreciated.
>>
>> Cheers
>>
>> [1]
>>
>> https://github.com/wso2-dev/carbon-analytics/tree/master/components/xanalytics
>>
>> --
>> *Niranda Perera*
>> Software Engineer, WSO2 Inc.
>> Mobile: +94-71-554-8430
>> Twitter: @n1r44 <https://twitter.com/N1R44>
>>
>
>


-- 
*Niranda Perera*
Software Engineer, WSO2 Inc.
Mobile: +94-71-554-8430
Twitter: @n1r44 <https://twitter.com/N1R44>

Re: Creating a SchemaRDD from an existing API

Posted by Niranda Perera <ni...@wso2.com>.
Hi Michael,

About this new data source API, what type of data sources would it support?
Does it have to be RDBMS necessarily?

Cheers

On Sat, Nov 29, 2014 at 12:57 AM, Michael Armbrust <mi...@databricks.com>
wrote:

> You probably don't need to create a new kind of SchemaRDD.  Instead I'd
> suggest taking a look at the data sources API that we are adding in Spark
> 1.2.  There is not a ton of documentation, but the test cases show how to
> implement the various interfaces
> <https://github.com/apache/spark/tree/master/sql/core/src/test/scala/org/apache/spark/sql/sources>,
> and there is an example library for reading Avro data
> <https://github.com/databricks/spark-avro>.
>
> On Thu, Nov 27, 2014 at 10:31 PM, Niranda Perera <ni...@wso2.com> wrote:
>
>> Hi,
>>
>> I am evaluating Spark for an analytic component where we do batch
>> processing of data using SQL.
>>
>> So, I am particularly interested in Spark SQL and in creating a SchemaRDD
>> from an existing API [1].
>>
>> This API exposes elements in a database as datasources. Using the methods
>> allowed by this data source, we can access and edit data.
>>
>> So, I want to create a custom SchemaRDD using the methods and provisions
>> of
>> this API. I tried going through Spark documentation and the Java Docs, but
>> unfortunately, I was unable to come to a final conclusion if this was
>> actually possible.
>>
>> I would like to ask the Spark Devs,
>> 1. As of the current Spark release, can we make a custom SchemaRDD?
>> 2. What is the extension point to a custom SchemaRDD? or are there
>> particular interfaces?
>> 3. Could you please point me the specific docs regarding this matter?
>>
>> Your help in this regard is highly appreciated.
>>
>> Cheers
>>
>> [1]
>>
>> https://github.com/wso2-dev/carbon-analytics/tree/master/components/xanalytics
>>
>> --
>> *Niranda Perera*
>> Software Engineer, WSO2 Inc.
>> Mobile: +94-71-554-8430
>> Twitter: @n1r44 <https://twitter.com/N1R44>
>>
>
>


-- 
*Niranda Perera*
Software Engineer, WSO2 Inc.
Mobile: +94-71-554-8430
Twitter: @n1r44 <https://twitter.com/N1R44>

Re: Creating a SchemaRDD from an existing API

Posted by Michael Armbrust <mi...@databricks.com>.
You probably don't need to create a new kind of SchemaRDD.  Instead I'd
suggest taking a look at the data sources API that we are adding in Spark
1.2.  There is not a ton of documentation, but the test cases show how to
implement the various interfaces
<https://github.com/apache/spark/tree/master/sql/core/src/test/scala/org/apache/spark/sql/sources>,
and there is an example library for reading Avro data
<https://github.com/databricks/spark-avro>.

On Thu, Nov 27, 2014 at 10:31 PM, Niranda Perera <ni...@wso2.com> wrote:

> Hi,
>
> I am evaluating Spark for an analytic component where we do batch
> processing of data using SQL.
>
> So, I am particularly interested in Spark SQL and in creating a SchemaRDD
> from an existing API [1].
>
> This API exposes elements in a database as datasources. Using the methods
> allowed by this data source, we can access and edit data.
>
> So, I want to create a custom SchemaRDD using the methods and provisions of
> this API. I tried going through Spark documentation and the Java Docs, but
> unfortunately, I was unable to come to a final conclusion if this was
> actually possible.
>
> I would like to ask the Spark Devs,
> 1. As of the current Spark release, can we make a custom SchemaRDD?
> 2. What is the extension point to a custom SchemaRDD? or are there
> particular interfaces?
> 3. Could you please point me the specific docs regarding this matter?
>
> Your help in this regard is highly appreciated.
>
> Cheers
>
> [1]
>
> https://github.com/wso2-dev/carbon-analytics/tree/master/components/xanalytics
>
> --
> *Niranda Perera*
> Software Engineer, WSO2 Inc.
> Mobile: +94-71-554-8430
> Twitter: @n1r44 <https://twitter.com/N1R44>
>

Re: Creating a SchemaRDD from an existing API

Posted by Michael Armbrust <mi...@databricks.com>.
You probably don't need to create a new kind of SchemaRDD.  Instead I'd
suggest taking a look at the data sources API that we are adding in Spark
1.2.  There is not a ton of documentation, but the test cases show how to
implement the various interfaces
<https://github.com/apache/spark/tree/master/sql/core/src/test/scala/org/apache/spark/sql/sources>,
and there is an example library for reading Avro data
<https://github.com/databricks/spark-avro>.

On Thu, Nov 27, 2014 at 10:31 PM, Niranda Perera <ni...@wso2.com> wrote:

> Hi,
>
> I am evaluating Spark for an analytic component where we do batch
> processing of data using SQL.
>
> So, I am particularly interested in Spark SQL and in creating a SchemaRDD
> from an existing API [1].
>
> This API exposes elements in a database as datasources. Using the methods
> allowed by this data source, we can access and edit data.
>
> So, I want to create a custom SchemaRDD using the methods and provisions of
> this API. I tried going through Spark documentation and the Java Docs, but
> unfortunately, I was unable to come to a final conclusion if this was
> actually possible.
>
> I would like to ask the Spark Devs,
> 1. As of the current Spark release, can we make a custom SchemaRDD?
> 2. What is the extension point to a custom SchemaRDD? or are there
> particular interfaces?
> 3. Could you please point me the specific docs regarding this matter?
>
> Your help in this regard is highly appreciated.
>
> Cheers
>
> [1]
>
> https://github.com/wso2-dev/carbon-analytics/tree/master/components/xanalytics
>
> --
> *Niranda Perera*
> Software Engineer, WSO2 Inc.
> Mobile: +94-71-554-8430
> Twitter: @n1r44 <https://twitter.com/N1R44>
>