You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@orc.apache.org by yanglei <le...@cinfotech.cn> on 2019/03/20 05:43:57 UTC

About ORC API Library

Dear Team

 

I am working on a project using golang to import data into hive table.

To get the best performance, I need this golang program to write ORC files to hdfs directly and load the data into hive table from the ORC file in hdfs.

I searched google and github and found no ORC writer till now.

Is there any possible to provide golang API officially?

 

BR

 

Lei

2019-03-20

 

 


Re: About ORC API Library

Posted by Xiening Dai <xn...@live.com>.
Yes, it is.

On Mar 28, 2019, at 11:00 AM, Istvan <le...@gmail.com>> wrote:

Yes, this is exactly what I was looking for. Is that a complete implementation?

Thanks,
Istvan

--
the sun shines for all

On 2019. Mar 27., at 19:49, Xiening Dai <xn...@live.com>> wrote:

The c++ reader writer in Apache Orc also doesn’t have dependencies on Hive, if that’s what you look for.


On Mar 27, 2019, at 10:23 AM, István <le...@gmail.com>> wrote:

Thanks both of you!

Regards,
Istvan

On Wed, Mar 27, 2019 at 6:05 PM David Phillips <da...@acz.org>> wrote:
Presto has a standalone ORC reader and writer:
https://github.com/prestosql/presto/tree/master/presto-orc

(Looking at the code now, it has dependency on the Hive BloomFilter
class which should be moved to a test dependency. I'll put up a PR now
to fix that.)

----
David Phillips
Co-founder @ Presto Software Foundation, Co-creator of Presto
(https://prestosql.io<https://prestosql.io/>)

On Wed, Mar 27, 2019 at 9:36 AM István <le...@gmail.com>> wrote:
>
> Sorry to hijack the thread, is there any Hadoop free implementation that includes a complete reader and writer?


--
the sun shines for all





Re: About ORC API Library

Posted by Istvan <le...@gmail.com>.
Yes, this is exactly what I was looking for. Is that a complete implementation?

Thanks,
Istvan

--
the sun shines for all

> On 2019. Mar 27., at 19:49, Xiening Dai <xn...@live.com> wrote:
> 
> The c++ reader writer in Apache Orc also doesn’t have dependencies on Hive, if that’s what you look for.
> 
> 
>> On Mar 27, 2019, at 10:23 AM, István <le...@gmail.com> wrote:
>> 
>> Thanks both of you!
>> 
>> Regards,
>> Istvan
>> 
>>> On Wed, Mar 27, 2019 at 6:05 PM David Phillips <da...@acz.org> wrote:
>>> Presto has a standalone ORC reader and writer:
>>> https://github.com/prestosql/presto/tree/master/presto-orc
>>> 
>>> (Looking at the code now, it has dependency on the Hive BloomFilter
>>> class which should be moved to a test dependency. I'll put up a PR now
>>> to fix that.)
>>> 
>>> ----
>>> David Phillips
>>> Co-founder @ Presto Software Foundation, Co-creator of Presto
>>> (https://prestosql.io)
>>> 
>>> On Wed, Mar 27, 2019 at 9:36 AM István <le...@gmail.com> wrote:
>>> >
>>> > Sorry to hijack the thread, is there any Hadoop free implementation that includes a complete reader and writer?
>> 
>> 
>> -- 
>> the sun shines for all
>> 
>> 
> 

Re: About ORC API Library

Posted by Xiening Dai <xn...@live.com>.
The c++ reader writer in Apache Orc also doesn’t have dependencies on Hive, if that’s what you look for.


On Mar 27, 2019, at 10:23 AM, István <le...@gmail.com>> wrote:

Thanks both of you!

Regards,
Istvan

On Wed, Mar 27, 2019 at 6:05 PM David Phillips <da...@acz.org>> wrote:
Presto has a standalone ORC reader and writer:
https://github.com/prestosql/presto/tree/master/presto-orc

(Looking at the code now, it has dependency on the Hive BloomFilter
class which should be moved to a test dependency. I'll put up a PR now
to fix that.)

----
David Phillips
Co-founder @ Presto Software Foundation, Co-creator of Presto
(https://prestosql.io<https://prestosql.io/>)

On Wed, Mar 27, 2019 at 9:36 AM István <le...@gmail.com>> wrote:
>
> Sorry to hijack the thread, is there any Hadoop free implementation that includes a complete reader and writer?


--
the sun shines for all




Re: About ORC API Library

Posted by István <le...@gmail.com>.
Thanks both of you!

Regards,
Istvan

On Wed, Mar 27, 2019 at 6:05 PM David Phillips <da...@acz.org> wrote:

> Presto has a standalone ORC reader and writer:
> https://github.com/prestosql/presto/tree/master/presto-orc
>
> (Looking at the code now, it has dependency on the Hive BloomFilter
> class which should be moved to a test dependency. I'll put up a PR now
> to fix that.)
>
> ----
> David Phillips
> Co-founder @ Presto Software Foundation, Co-creator of Presto
> (https://prestosql.io)
>
> On Wed, Mar 27, 2019 at 9:36 AM István <le...@gmail.com> wrote:
> >
> > Sorry to hijack the thread, is there any Hadoop free implementation that
> includes a complete reader and writer?
>


-- 
the sun shines for all

Re: About ORC API Library

Posted by David Phillips <da...@acz.org>.
Presto has a standalone ORC reader and writer:
https://github.com/prestosql/presto/tree/master/presto-orc

(Looking at the code now, it has dependency on the Hive BloomFilter
class which should be moved to a test dependency. I'll put up a PR now
to fix that.)

----
David Phillips
Co-founder @ Presto Software Foundation, Co-creator of Presto
(https://prestosql.io)

On Wed, Mar 27, 2019 at 9:36 AM István <le...@gmail.com> wrote:
>
> Sorry to hijack the thread, is there any Hadoop free implementation that includes a complete reader and writer?

Re: About ORC API Library

Posted by Karthik Abram <ka...@eclecticlogic.com>.
Github.com/eclecticlogic/eclectic-orc is a Java based orc writer that works with annotated classes to produce orc files. Perhaps of use if java is an option. 

> On Mar 27, 2019, at 12:35 PM, István <le...@gmail.com> wrote:
> 
> Hi,
> 
> Sorry to hijack the thread, is there any Hadoop free implementation that includes a complete reader and writer?
> 
> Thanks,
> Istvan
> 
>> On Wed, Mar 27, 2019 at 4:41 PM Owen O'Malley <ow...@gmail.com> wrote:
>> You may want to start from their current development rather than starting from scratch. It looks like their license is the MIT license, which is very permissive.
>> 
>> If you do create a writer, I’d ask you to both keep us informed and request a writer id.
>> https://github.com/apache/orc/blob/d082718a0bf8d703c96f3bf39ea02307498d1800/proto/orc_proto.proto#L211
>> 
>> The writer ids are useful in case we need to work around bugs in a particular implementation.
>> 
>> Thanks,
>>    Owen
>> 
>> 
>> 
>>> On Mar 27, 2019, at 8:16 AM, yanglei <le...@cinfotech.cn> wrote:
>>> 
>>> Hi, Owen
>>>  
>>> I had already checked this lib, but writer is not ready till now. I cannot wait, so I am planning to build a writer myself.
>>>  
>>> Thanks
>>>  
>>> 发件人: Owen O'Malley <ow...@gmail.com>
>>> 答复: <us...@orc.apache.org>
>>> 日期: 2019年3月27日 星期三 04:26
>>> 收件人: <us...@orc.apache.org>
>>> 主题: Re: About ORC API Library
>>>  
>>> You should check on - https://github.com/scritchley/orc , which is a native Go implementation for an ORC reader. The read me says that a writer is in progress.
>>>  
>>> .. Owen
>>>  
>>>> On Wed, Mar 20, 2019 at 9:56 PM Gang Wu <ga...@apache.org> wrote:
>>>> Hi Lei,
>>>>  
>>>> Unfortunately we don't have a Go binding for the ORC writer. I am not sure if it is possible for you to use cgo package in Go to call C++ API in your application?
>>>>  
>>>> Thanks,
>>>> Gang
>>>>  
>>>>> On Tue, Mar 19, 2019 at 10:44 PM yanglei <le...@cinfotech.cn> wrote:
>>>>> Dear Team
>>>>>  
>>>>> I am working on a project using golang to import data into hive table.
>>>>> To get the best performance, I need this golang program to write ORC files to hdfs directly and load the data into hive table from the ORC file in hdfs.
>>>>> I searched google and github and found no ORC writer till now.
>>>>> Is there any possible to provide golang API officially?
>>>>>  
>>>>> BR
>>>>>  
>>>>> Lei
>>>>> 2019-03-20
>> 
> 
> 
> -- 
> the sun shines for all
> 
> 

Re: About ORC API Library

Posted by István <le...@gmail.com>.
Hi,

Sorry to hijack the thread, is there any Hadoop free implementation that
includes a complete reader and writer?

Thanks,
Istvan

On Wed, Mar 27, 2019 at 4:41 PM Owen O'Malley <ow...@gmail.com>
wrote:

> You may want to start from their current development rather than starting
> from scratch. It looks like their license is the MIT license, which is very
> permissive.
>
> If you do create a writer, I’d ask you to both keep us informed and
> request a writer id.
>
> https://github.com/apache/orc/blob/d082718a0bf8d703c96f3bf39ea02307498d1800/proto/orc_proto.proto#L211
>
> The writer ids are useful in case we need to work around bugs in a
> particular implementation.
>
> Thanks,
>    Owen
>
>
>
> On Mar 27, 2019, at 8:16 AM, yanglei <le...@cinfotech.cn> wrote:
>
> Hi, Owen
>
> I had already checked this lib, but writer is not ready till now. I cannot
> wait, so I am planning to build a writer myself.
>
> Thanks
>
> *发件人**: *Owen O'Malley <ow...@gmail.com>
> *答复**: *<us...@orc.apache.org>
> *日期**: *2019年3月27日 星期三 04:26
> *收件人**: *<us...@orc.apache.org>
> *主题**: *Re: About ORC API Library
>
> You should check on - https://github.com/scritchley/orc , which is a
> native Go implementation for an ORC reader. The read me says that a writer
> is in progress.
>
> .. Owen
>
> On Wed, Mar 20, 2019 at 9:56 PM Gang Wu <ga...@apache.org> wrote:
>
> Hi Lei,
>
> Unfortunately we don't have a Go binding for the ORC writer. I am not sure
> if it is possible for you to use cgo package in Go to call C++ API in your
> application?
>
> Thanks,
> Gang
>
> On Tue, Mar 19, 2019 at 10:44 PM yanglei <le...@cinfotech.cn> wrote:
>
> Dear Team
>
> I am working on a project using golang to import data into hive table.
> To get the best performance, I need this golang program to write ORC files
> to hdfs directly and load the data into hive table from the ORC file in
> hdfs.
> I searched google and github and found no ORC writer till now.
> Is there any possible to provide golang API officially?
>
> BR
>
> Lei
> 2019-03-20
>
>
>

-- 
the sun shines for all

Re: About ORC API Library

Posted by Owen O'Malley <ow...@gmail.com>.
You may want to start from their current development rather than starting from scratch. It looks like their license is the MIT license, which is very permissive.

If you do create a writer, I’d ask you to both keep us informed and request a writer id.
https://github.com/apache/orc/blob/d082718a0bf8d703c96f3bf39ea02307498d1800/proto/orc_proto.proto#L211 <https://github.com/apache/orc/blob/d082718a0bf8d703c96f3bf39ea02307498d1800/proto/orc_proto.proto#L211>

The writer ids are useful in case we need to work around bugs in a particular implementation.

Thanks,
   Owen



> On Mar 27, 2019, at 8:16 AM, yanglei <le...@cinfotech.cn> wrote:
> 
> Hi, Owen
>  
> I had already checked this lib, but writer is not ready till now. I cannot wait, so I am planning to build a writer myself.
>  
> Thanks
>  
> 发件人: Owen O'Malley <owen.omalley@gmail.com <ma...@gmail.com>>
> 答复: <user@orc.apache.org <ma...@orc.apache.org>>
> 日期: 2019年3月27日 星期三 04:26
> 收件人: <us...@orc.apache.org>
> 主题: Re: About ORC API Library
>  
> You should check on - https://github.com/scritchley/orc <https://github.com/scritchley/orc> , which is a native Go implementation for an ORC reader. The read me says that a writer is in progress.
>  
> .. Owen
>  
> On Wed, Mar 20, 2019 at 9:56 PM Gang Wu <gangwu@apache.org <ma...@apache.org>> wrote:
>> Hi Lei,
>>  
>> Unfortunately we don't have a Go binding for the ORC writer. I am not sure if it is possible for you to use cgo package in Go to call C++ API in your application?
>>  
>> Thanks,
>> Gang
>>  
>> On Tue, Mar 19, 2019 at 10:44 PM yanglei <lei.yang@cinfotech.cn <ma...@cinfotech.cn>> wrote:
>>> Dear Team
>>>  
>>> I am working on a project using golang to import data into hive table.
>>> To get the best performance, I need this golang program to write ORC files to hdfs directly and load the data into hive table from the ORC file in hdfs.
>>> I searched google and github and found no ORC writer till now.
>>> Is there any possible to provide golang API officially?
>>>  
>>> BR
>>>  
>>> Lei
>>> 2019-03-20


Re: About ORC API Library

Posted by yanglei <le...@cinfotech.cn>.
Hi, Owen

 

I had already checked this lib, but writer is not ready till now. I cannot wait, so I am planning to build a writer myself.

 

Thanks

 

发件人: Owen O'Malley <ow...@gmail.com>
答复: <us...@orc.apache.org>
日期: 2019年3月27日 星期三 04:26
收件人: <us...@orc.apache.org>
主题: Re: About ORC API Library

 

You should check on - https://github.com/scritchley/orc , which is a native Go implementation for an ORC reader. The read me says that a writer is in progress.

 

.. Owen

 

On Wed, Mar 20, 2019 at 9:56 PM Gang Wu <ga...@apache.org> wrote:

Hi Lei,

 

Unfortunately we don't have a Go binding for the ORC writer. I am not sure if it is possible for you to use cgo package in Go to call C++ API in your application?

 

Thanks,

Gang

 

On Tue, Mar 19, 2019 at 10:44 PM yanglei <le...@cinfotech.cn> wrote:

Dear Team

 

I am working on a project using golang to import data into hive table.

To get the best performance, I need this golang program to write ORC files to hdfs directly and load the data into hive table from the ORC file in hdfs.

I searched google and github and found no ORC writer till now.

Is there any possible to provide golang API officially?

 

BR

 

Lei

2019-03-20

 

 


Re: About ORC API Library

Posted by Owen O'Malley <ow...@gmail.com>.
You should check on - https://github.com/scritchley/orc , which is a native
Go implementation for an ORC reader. The read me says that a writer is in
progress.

.. Owen

On Wed, Mar 20, 2019 at 9:56 PM Gang Wu <ga...@apache.org> wrote:

> Hi Lei,
>
> Unfortunately we don't have a Go binding for the ORC writer. I am not sure
> if it is possible for you to use cgo package in Go to call C++ API in your
> application?
>
> Thanks,
> Gang
>
> On Tue, Mar 19, 2019 at 10:44 PM yanglei <le...@cinfotech.cn> wrote:
>
>> Dear Team
>>
>>
>>
>> I am working on a project using golang to import data into hive table.
>>
>> To get the best performance, I need this golang program to write ORC
>> files to hdfs directly and load the data into hive table from the ORC file
>> in hdfs.
>>
>> I searched google and github and found no ORC writer till now.
>>
>> Is there any possible to provide golang API officially?
>>
>>
>>
>> BR
>>
>>
>>
>> Lei
>>
>> 2019-03-20
>>
>>
>>
>>
>>
>

Re: About ORC API Library

Posted by Gang Wu <ga...@apache.org>.
Hi Lei,

Unfortunately we don't have a Go binding for the ORC writer. I am not sure
if it is possible for you to use cgo package in Go to call C++ API in your
application?

Thanks,
Gang

On Tue, Mar 19, 2019 at 10:44 PM yanglei <le...@cinfotech.cn> wrote:

> Dear Team
>
>
>
> I am working on a project using golang to import data into hive table.
>
> To get the best performance, I need this golang program to write ORC files
> to hdfs directly and load the data into hive table from the ORC file in
> hdfs.
>
> I searched google and github and found no ORC writer till now.
>
> Is there any possible to provide golang API officially?
>
>
>
> BR
>
>
>
> Lei
>
> 2019-03-20
>
>
>
>
>