You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by James Yu <jy...@gmail.com> on 2014/10/08 19:03:11 UTC

will/when Spark/SparkSQL will support ORCFile format

Didn't see anyone asked the question before, but I was wondering if anyone
knows if Spark/SparkSQL will support ORCFile format soon? ORCFile is
getting more and more popular hi Hive world.

Thanks,
James

Re: will/when Spark/SparkSQL will support ORCFile format

Posted by James Yu <jy...@gmail.com>.
Sounds great, thanks!



On Thu, Oct 9, 2014 at 2:22 PM, Michael Armbrust <mi...@databricks.com>
wrote:

> Yes, the foreign sources work is only about exposing a stable set of APIs
> for external libraries to link against (to avoid the spark assembly
> becoming a dependency mess).  The code path these APIs use will be the same
> as that for datasources included in the core spark sql library.
>
> Michael
>
> On Thu, Oct 9, 2014 at 2:18 PM, James Yu <jy...@gmail.com> wrote:
>
>> For performance, will foreign data format support, same as native ones?
>>
>> Thanks,
>> James
>>
>>
>> On Wed, Oct 8, 2014 at 11:03 PM, Cheng Lian <li...@gmail.com>
>> wrote:
>>
>> > The foreign data source API PR also matters here
>> > https://www.github.com/apache/spark/pull/2475
>> >
>> > Foreign data source like ORC can be added more easily and systematically
>> > after this PR is merged.
>> >
>> > On 10/9/14 8:22 AM, James Yu wrote:
>> >
>> >> Thanks Mark! I will keep eye on it.
>> >>
>> >> @Evan, I saw people use both format, so I really want to have Spark
>> >> support
>> >> ORCFile.
>> >>
>> >>
>> >> On Wed, Oct 8, 2014 at 11:12 AM, Mark Hamstra <mark@clearstorydata.com
>> >
>> >> wrote:
>> >>
>> >>  https://github.com/apache/spark/pull/2576
>> >>>
>> >>>
>> >>>
>> >>> On Wed, Oct 8, 2014 at 11:01 AM, Evan Chan <ve...@gmail.com>
>> >>> wrote:
>> >>>
>> >>>  James,
>> >>>>
>> >>>> Michael at the meetup last night said there was some development
>> >>>> activity around ORCFiles.
>> >>>>
>> >>>> I'm curious though, what are the pros and cons of ORCFiles vs
>> Parquet?
>> >>>>
>> >>>> On Wed, Oct 8, 2014 at 10:03 AM, James Yu <jy...@gmail.com> wrote:
>> >>>>
>> >>>>> Didn't see anyone asked the question before, but I was wondering if
>> >>>>>
>> >>>> anyone
>> >>>>
>> >>>>> knows if Spark/SparkSQL will support ORCFile format soon? ORCFile is
>> >>>>> getting more and more popular hi Hive world.
>> >>>>>
>> >>>>> Thanks,
>> >>>>> James
>> >>>>>
>> >>>> ---------------------------------------------------------------------
>> >>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> >>>> For additional commands, e-mail: dev-help@spark.apache.org
>> >>>>
>> >>>>
>> >>>>
>> >
>>
>
>

Re: will/when Spark/SparkSQL will support ORCFile format

Posted by Michael Armbrust <mi...@databricks.com>.
Yes, the foreign sources work is only about exposing a stable set of APIs
for external libraries to link against (to avoid the spark assembly
becoming a dependency mess).  The code path these APIs use will be the same
as that for datasources included in the core spark sql library.

Michael

On Thu, Oct 9, 2014 at 2:18 PM, James Yu <jy...@gmail.com> wrote:

> For performance, will foreign data format support, same as native ones?
>
> Thanks,
> James
>
>
> On Wed, Oct 8, 2014 at 11:03 PM, Cheng Lian <li...@gmail.com> wrote:
>
> > The foreign data source API PR also matters here
> > https://www.github.com/apache/spark/pull/2475
> >
> > Foreign data source like ORC can be added more easily and systematically
> > after this PR is merged.
> >
> > On 10/9/14 8:22 AM, James Yu wrote:
> >
> >> Thanks Mark! I will keep eye on it.
> >>
> >> @Evan, I saw people use both format, so I really want to have Spark
> >> support
> >> ORCFile.
> >>
> >>
> >> On Wed, Oct 8, 2014 at 11:12 AM, Mark Hamstra <ma...@clearstorydata.com>
> >> wrote:
> >>
> >>  https://github.com/apache/spark/pull/2576
> >>>
> >>>
> >>>
> >>> On Wed, Oct 8, 2014 at 11:01 AM, Evan Chan <ve...@gmail.com>
> >>> wrote:
> >>>
> >>>  James,
> >>>>
> >>>> Michael at the meetup last night said there was some development
> >>>> activity around ORCFiles.
> >>>>
> >>>> I'm curious though, what are the pros and cons of ORCFiles vs Parquet?
> >>>>
> >>>> On Wed, Oct 8, 2014 at 10:03 AM, James Yu <jy...@gmail.com> wrote:
> >>>>
> >>>>> Didn't see anyone asked the question before, but I was wondering if
> >>>>>
> >>>> anyone
> >>>>
> >>>>> knows if Spark/SparkSQL will support ORCFile format soon? ORCFile is
> >>>>> getting more and more popular hi Hive world.
> >>>>>
> >>>>> Thanks,
> >>>>> James
> >>>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> >>>> For additional commands, e-mail: dev-help@spark.apache.org
> >>>>
> >>>>
> >>>>
> >
>

Re: will/when Spark/SparkSQL will support ORCFile format

Posted by James Yu <jy...@gmail.com>.
For performance, will foreign data format support, same as native ones?

Thanks,
James


On Wed, Oct 8, 2014 at 11:03 PM, Cheng Lian <li...@gmail.com> wrote:

> The foreign data source API PR also matters here
> https://www.github.com/apache/spark/pull/2475
>
> Foreign data source like ORC can be added more easily and systematically
> after this PR is merged.
>
> On 10/9/14 8:22 AM, James Yu wrote:
>
>> Thanks Mark! I will keep eye on it.
>>
>> @Evan, I saw people use both format, so I really want to have Spark
>> support
>> ORCFile.
>>
>>
>> On Wed, Oct 8, 2014 at 11:12 AM, Mark Hamstra <ma...@clearstorydata.com>
>> wrote:
>>
>>  https://github.com/apache/spark/pull/2576
>>>
>>>
>>>
>>> On Wed, Oct 8, 2014 at 11:01 AM, Evan Chan <ve...@gmail.com>
>>> wrote:
>>>
>>>  James,
>>>>
>>>> Michael at the meetup last night said there was some development
>>>> activity around ORCFiles.
>>>>
>>>> I'm curious though, what are the pros and cons of ORCFiles vs Parquet?
>>>>
>>>> On Wed, Oct 8, 2014 at 10:03 AM, James Yu <jy...@gmail.com> wrote:
>>>>
>>>>> Didn't see anyone asked the question before, but I was wondering if
>>>>>
>>>> anyone
>>>>
>>>>> knows if Spark/SparkSQL will support ORCFile format soon? ORCFile is
>>>>> getting more and more popular hi Hive world.
>>>>>
>>>>> Thanks,
>>>>> James
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>>
>>>>
>>>>
>

Re: will/when Spark/SparkSQL will support ORCFile format

Posted by Cheng Lian <li...@gmail.com>.
The foreign data source API PR also matters here 
https://www.github.com/apache/spark/pull/2475

Foreign data source like ORC can be added more easily and systematically 
after this PR is merged.

On 10/9/14 8:22 AM, James Yu wrote:
> Thanks Mark! I will keep eye on it.
>
> @Evan, I saw people use both format, so I really want to have Spark support
> ORCFile.
>
>
> On Wed, Oct 8, 2014 at 11:12 AM, Mark Hamstra <ma...@clearstorydata.com>
> wrote:
>
>> https://github.com/apache/spark/pull/2576
>>
>>
>>
>> On Wed, Oct 8, 2014 at 11:01 AM, Evan Chan <ve...@gmail.com>
>> wrote:
>>
>>> James,
>>>
>>> Michael at the meetup last night said there was some development
>>> activity around ORCFiles.
>>>
>>> I'm curious though, what are the pros and cons of ORCFiles vs Parquet?
>>>
>>> On Wed, Oct 8, 2014 at 10:03 AM, James Yu <jy...@gmail.com> wrote:
>>>> Didn't see anyone asked the question before, but I was wondering if
>>> anyone
>>>> knows if Spark/SparkSQL will support ORCFile format soon? ORCFile is
>>>> getting more and more popular hi Hive world.
>>>>
>>>> Thanks,
>>>> James
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>
>>>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: will/when Spark/SparkSQL will support ORCFile format

Posted by James Yu <jy...@gmail.com>.
Thanks Mark! I will keep eye on it.

@Evan, I saw people use both format, so I really want to have Spark support
ORCFile.


On Wed, Oct 8, 2014 at 11:12 AM, Mark Hamstra <ma...@clearstorydata.com>
wrote:

> https://github.com/apache/spark/pull/2576
>
>
>
> On Wed, Oct 8, 2014 at 11:01 AM, Evan Chan <ve...@gmail.com>
> wrote:
>
>> James,
>>
>> Michael at the meetup last night said there was some development
>> activity around ORCFiles.
>>
>> I'm curious though, what are the pros and cons of ORCFiles vs Parquet?
>>
>> On Wed, Oct 8, 2014 at 10:03 AM, James Yu <jy...@gmail.com> wrote:
>> > Didn't see anyone asked the question before, but I was wondering if
>> anyone
>> > knows if Spark/SparkSQL will support ORCFile format soon? ORCFile is
>> > getting more and more popular hi Hive world.
>> >
>> > Thanks,
>> > James
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>>
>

Re: will/when Spark/SparkSQL will support ORCFile format

Posted by Mark Hamstra <ma...@clearstorydata.com>.
https://github.com/apache/spark/pull/2576



On Wed, Oct 8, 2014 at 11:01 AM, Evan Chan <ve...@gmail.com> wrote:

> James,
>
> Michael at the meetup last night said there was some development
> activity around ORCFiles.
>
> I'm curious though, what are the pros and cons of ORCFiles vs Parquet?
>
> On Wed, Oct 8, 2014 at 10:03 AM, James Yu <jy...@gmail.com> wrote:
> > Didn't see anyone asked the question before, but I was wondering if
> anyone
> > knows if Spark/SparkSQL will support ORCFile format soon? ORCFile is
> > getting more and more popular hi Hive world.
> >
> > Thanks,
> > James
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: will/when Spark/SparkSQL will support ORCFile format

Posted by Evan Chan <ve...@gmail.com>.
James,

Michael at the meetup last night said there was some development
activity around ORCFiles.

I'm curious though, what are the pros and cons of ORCFiles vs Parquet?

On Wed, Oct 8, 2014 at 10:03 AM, James Yu <jy...@gmail.com> wrote:
> Didn't see anyone asked the question before, but I was wondering if anyone
> knows if Spark/SparkSQL will support ORCFile format soon? ORCFile is
> getting more and more popular hi Hive world.
>
> Thanks,
> James

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org