You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@orc.apache.org by Alessandro D'Armiento <al...@gmail.com> on 2021/03/03 10:14:46 UTC

Serialize List> to ORC byte stream

Good morning,
I am using the Orc-core Java library for the first time.
I could not find in the documentation a way to use the Orc File Writer to
create Orc files, but instead of writing them to disk, retain the bytearray
to have the actual writing handled by some other system.
Is this natively possible?
I also thought about using stuff like Google jimfs to create a file in
memory and then reading it, but it's suboptimal.

Thanks,
Alessandro

Re: Serialize List> to ORC byte stream

Posted by Alessandro D'Armiento <al...@gmail.com>.
Hello Panagiotis,
Thank you very much.
I took the InMemoryFileSystem solution for the time being. I don't quite
like the idea of having to write and read the same bytes, but I'll use it
until that big PR will be merged.

Cheers,
Alessandro

Il giorno mer 3 mar 2021 alle ore 13:53 Panos Garefalakis <
pangaref@gmail.com> ha scritto:

> Hey Allesandro,
>
> Welcome to the community!
> Currently ORC has a hard dependency on the Hadoop FS -- so the easiest way
> to use the Writer would be to directly write to disk.
> There is an ongoing effort to remove this (undeeded) dependency -- see
> ORC-508  <https://issues.apache.org/jira/browse/ORC-508> and a fairly
> recent PR <https://github.com/apache/orc/pull/641>  by Owen if you want
> to hack around.
> An alternative would be to use a custom in-memory FS as we currently do
> for some tests
> <https://github.com/pgaref/orc/blob/master/java/core/src/test/org/apache/orc/impl/TestPhysicalFsWriter.java#L168>
> .
>
> Hope this helps!
>
> Cheers,
> Panagiotis
>
> On Wed, Mar 3, 2021 at 10:15 AM Alessandro D'Armiento <
> alessandro.darmiento1991@gmail.com> wrote:
>
>> Good morning,
>> I am using the Orc-core Java library for the first time.
>> I could not find in the documentation a way to use the Orc File Writer to
>> create Orc files, but instead of writing them to disk, retain the bytearray
>> to have the actual writing handled by some other system.
>> Is this natively possible?
>> I also thought about using stuff like Google jimfs to create a file in
>> memory and then reading it, but it's suboptimal.
>>
>> Thanks,
>> Alessandro
>>
>

Re: Serialize List> to ORC byte stream

Posted by Panos Garefalakis <pa...@gmail.com>.
Hey Allesandro,

Welcome to the community!
Currently ORC has a hard dependency on the Hadoop FS -- so the easiest way
to use the Writer would be to directly write to disk.
There is an ongoing effort to remove this (undeeded) dependency -- see
ORC-508  <https://issues.apache.org/jira/browse/ORC-508> and a fairly
recent PR <https://github.com/apache/orc/pull/641>  by Owen if you want to
hack around.
An alternative would be to use a custom in-memory FS as we currently do for
some tests
<https://github.com/pgaref/orc/blob/master/java/core/src/test/org/apache/orc/impl/TestPhysicalFsWriter.java#L168>
.

Hope this helps!

Cheers,
Panagiotis

On Wed, Mar 3, 2021 at 10:15 AM Alessandro D'Armiento <
alessandro.darmiento1991@gmail.com> wrote:

> Good morning,
> I am using the Orc-core Java library for the first time.
> I could not find in the documentation a way to use the Orc File Writer to
> create Orc files, but instead of writing them to disk, retain the bytearray
> to have the actual writing handled by some other system.
> Is this natively possible?
> I also thought about using stuff like Google jimfs to create a file in
> memory and then reading it, but it's suboptimal.
>
> Thanks,
> Alessandro
>