You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Niranda Perera <ni...@wso2.com> on 2014/12/01 10:34:55 UTC

Re: Creating a SchemaRDD from an existing API

Hi Michael,

About this new data source API, what type of data sources would it support?
Does it have to be RDBMS necessarily?

Cheers

On Sat, Nov 29, 2014 at 12:57 AM, Michael Armbrust <mi...@databricks.com>
wrote:

> You probably don't need to create a new kind of SchemaRDD.  Instead I'd
> suggest taking a look at the data sources API that we are adding in Spark
> 1.2.  There is not a ton of documentation, but the test cases show how to
> implement the various interfaces
> <https://github.com/apache/spark/tree/master/sql/core/src/test/scala/org/apache/spark/sql/sources>,
> and there is an example library for reading Avro data
> <https://github.com/databricks/spark-avro>.
>
> On Thu, Nov 27, 2014 at 10:31 PM, Niranda Perera <ni...@wso2.com> wrote:
>
>> Hi,
>>
>> I am evaluating Spark for an analytic component where we do batch
>> processing of data using SQL.
>>
>> So, I am particularly interested in Spark SQL and in creating a SchemaRDD
>> from an existing API [1].
>>
>> This API exposes elements in a database as datasources. Using the methods
>> allowed by this data source, we can access and edit data.
>>
>> So, I want to create a custom SchemaRDD using the methods and provisions
>> of
>> this API. I tried going through Spark documentation and the Java Docs, but
>> unfortunately, I was unable to come to a final conclusion if this was
>> actually possible.
>>
>> I would like to ask the Spark Devs,
>> 1. As of the current Spark release, can we make a custom SchemaRDD?
>> 2. What is the extension point to a custom SchemaRDD? or are there
>> particular interfaces?
>> 3. Could you please point me the specific docs regarding this matter?
>>
>> Your help in this regard is highly appreciated.
>>
>> Cheers
>>
>> [1]
>>
>> https://github.com/wso2-dev/carbon-analytics/tree/master/components/xanalytics
>>
>> --
>> *Niranda Perera*
>> Software Engineer, WSO2 Inc.
>> Mobile: +94-71-554-8430
>> Twitter: @n1r44 <https://twitter.com/N1R44>
>>
>
>


-- 
*Niranda Perera*
Software Engineer, WSO2 Inc.
Mobile: +94-71-554-8430
Twitter: @n1r44 <https://twitter.com/N1R44>

Re: Creating a SchemaRDD from an existing API

Posted by Michael Armbrust <mi...@databricks.com>.
No, it should support any data source that has a schema and can produce
rows.

On Mon, Dec 1, 2014 at 1:34 AM, Niranda Perera <ni...@wso2.com> wrote:

> Hi Michael,
>
> About this new data source API, what type of data sources would it
> support? Does it have to be RDBMS necessarily?
>
> Cheers
>
> On Sat, Nov 29, 2014 at 12:57 AM, Michael Armbrust <michael@databricks.com
> > wrote:
>
>> You probably don't need to create a new kind of SchemaRDD.  Instead I'd
>> suggest taking a look at the data sources API that we are adding in Spark
>> 1.2.  There is not a ton of documentation, but the test cases show how
>> to implement the various interfaces
>> <https://github.com/apache/spark/tree/master/sql/core/src/test/scala/org/apache/spark/sql/sources>,
>> and there is an example library for reading Avro data
>> <https://github.com/databricks/spark-avro>.
>>
>> On Thu, Nov 27, 2014 at 10:31 PM, Niranda Perera <ni...@wso2.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am evaluating Spark for an analytic component where we do batch
>>> processing of data using SQL.
>>>
>>> So, I am particularly interested in Spark SQL and in creating a SchemaRDD
>>> from an existing API [1].
>>>
>>> This API exposes elements in a database as datasources. Using the methods
>>> allowed by this data source, we can access and edit data.
>>>
>>> So, I want to create a custom SchemaRDD using the methods and provisions
>>> of
>>> this API. I tried going through Spark documentation and the Java Docs,
>>> but
>>> unfortunately, I was unable to come to a final conclusion if this was
>>> actually possible.
>>>
>>> I would like to ask the Spark Devs,
>>> 1. As of the current Spark release, can we make a custom SchemaRDD?
>>> 2. What is the extension point to a custom SchemaRDD? or are there
>>> particular interfaces?
>>> 3. Could you please point me the specific docs regarding this matter?
>>>
>>> Your help in this regard is highly appreciated.
>>>
>>> Cheers
>>>
>>> [1]
>>>
>>> https://github.com/wso2-dev/carbon-analytics/tree/master/components/xanalytics
>>>
>>> --
>>> *Niranda Perera*
>>> Software Engineer, WSO2 Inc.
>>> Mobile: +94-71-554-8430
>>> Twitter: @n1r44 <https://twitter.com/N1R44>
>>>
>>
>>
>
>
> --
> *Niranda Perera*
> Software Engineer, WSO2 Inc.
> Mobile: +94-71-554-8430
> Twitter: @n1r44 <https://twitter.com/N1R44>
>

Re: Creating a SchemaRDD from an existing API

Posted by Michael Armbrust <mi...@databricks.com>.
No, it should support any data source that has a schema and can produce
rows.

On Mon, Dec 1, 2014 at 1:34 AM, Niranda Perera <ni...@wso2.com> wrote:

> Hi Michael,
>
> About this new data source API, what type of data sources would it
> support? Does it have to be RDBMS necessarily?
>
> Cheers
>
> On Sat, Nov 29, 2014 at 12:57 AM, Michael Armbrust <michael@databricks.com
> > wrote:
>
>> You probably don't need to create a new kind of SchemaRDD.  Instead I'd
>> suggest taking a look at the data sources API that we are adding in Spark
>> 1.2.  There is not a ton of documentation, but the test cases show how
>> to implement the various interfaces
>> <https://github.com/apache/spark/tree/master/sql/core/src/test/scala/org/apache/spark/sql/sources>,
>> and there is an example library for reading Avro data
>> <https://github.com/databricks/spark-avro>.
>>
>> On Thu, Nov 27, 2014 at 10:31 PM, Niranda Perera <ni...@wso2.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am evaluating Spark for an analytic component where we do batch
>>> processing of data using SQL.
>>>
>>> So, I am particularly interested in Spark SQL and in creating a SchemaRDD
>>> from an existing API [1].
>>>
>>> This API exposes elements in a database as datasources. Using the methods
>>> allowed by this data source, we can access and edit data.
>>>
>>> So, I want to create a custom SchemaRDD using the methods and provisions
>>> of
>>> this API. I tried going through Spark documentation and the Java Docs,
>>> but
>>> unfortunately, I was unable to come to a final conclusion if this was
>>> actually possible.
>>>
>>> I would like to ask the Spark Devs,
>>> 1. As of the current Spark release, can we make a custom SchemaRDD?
>>> 2. What is the extension point to a custom SchemaRDD? or are there
>>> particular interfaces?
>>> 3. Could you please point me the specific docs regarding this matter?
>>>
>>> Your help in this regard is highly appreciated.
>>>
>>> Cheers
>>>
>>> [1]
>>>
>>> https://github.com/wso2-dev/carbon-analytics/tree/master/components/xanalytics
>>>
>>> --
>>> *Niranda Perera*
>>> Software Engineer, WSO2 Inc.
>>> Mobile: +94-71-554-8430
>>> Twitter: @n1r44 <https://twitter.com/N1R44>
>>>
>>
>>
>
>
> --
> *Niranda Perera*
> Software Engineer, WSO2 Inc.
> Mobile: +94-71-554-8430
> Twitter: @n1r44 <https://twitter.com/N1R44>
>