You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by David Mollitor <da...@gmail.com> on 2020/01/24 00:39:57 UTC

Writing to Local File

I am usually a user of Parquet through Hive or Spark, but I wanted to sit
down and write my own small example application of using the library
directly.

Is there some quick way that I can write a Parquet file to the local file
system using java.nio.Path (i.e., with no Hadoop dependencies?)

Thanks!

Re: Writing to Local File

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
Sounds good! Thanks for filing that issue.

If you'd like to work on separating out the Hadoop code, I'd be happy to
help review. It's something I've been meaning to do for a while.

On Fri, Jan 24, 2020 at 10:13 AM David Mollitor <da...@gmail.com> wrote:

> Thanks Ryan for the confirmation of my suspicions.
>
> That would certainly make a quick sample application easier to achieve
> from an adoption perspective.
>
> I had just put this JIRA in.  I'll leave it open for anyone to jump in on.
> https://issues.apache.org/jira/browse/PARQUET-1776
>
> Thanks,
> David
>
>
> On Fri, Jan 24, 2020 at 12:08 PM Ryan Blue <rb...@netflix.com.invalid>
> wrote:
>
>> There's not currently a way to do this without Hadoop. We've been working
>> on moving to the `InputFile` and `OutputFile` abstractions so that we can
>> get rid of it, but Parquet still depends on Hadoop libraries for
>> compression and we haven't pulled out the parts of Parquet that use the
>> new
>> abstraction from the older ones that accept Hadoop Paths, so you need to
>> have Hadoop in your classpath either way.
>>
>> To get to where you can write a file without Hadoop dependencies, I think
>> we need to create a new module that parquet-hadoop will depend on with the
>> `InputFile`/`OutputFile` implementation. Then we would refactor the Hadoop
>> classes to extend those implementations to avoid breaking the Hadoop
>> classes. We'd also need to implement the compression API directly on top
>> of
>> aircompressor in this module.
>>
>> On Thu, Jan 23, 2020 at 4:40 PM David Mollitor <da...@gmail.com> wrote:
>>
>> > I am usually a user of Parquet through Hive or Spark, but I wanted to
>> sit
>> > down and write my own small example application of using the library
>> > directly.
>> >
>> > Is there some quick way that I can write a Parquet file to the local
>> file
>> > system using java.nio.Path (i.e., with no Hadoop dependencies?)
>> >
>> > Thanks!
>> >
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>

-- 
Ryan Blue
Software Engineer
Netflix

Re: Writing to Local File

Posted by David Mollitor <da...@gmail.com>.
Thanks Ryan for the confirmation of my suspicions.

That would certainly make a quick sample application easier to achieve from
an adoption perspective.

I had just put this JIRA in.  I'll leave it open for anyone to jump in on.
https://issues.apache.org/jira/browse/PARQUET-1776

Thanks,
David


On Fri, Jan 24, 2020 at 12:08 PM Ryan Blue <rb...@netflix.com.invalid>
wrote:

> There's not currently a way to do this without Hadoop. We've been working
> on moving to the `InputFile` and `OutputFile` abstractions so that we can
> get rid of it, but Parquet still depends on Hadoop libraries for
> compression and we haven't pulled out the parts of Parquet that use the new
> abstraction from the older ones that accept Hadoop Paths, so you need to
> have Hadoop in your classpath either way.
>
> To get to where you can write a file without Hadoop dependencies, I think
> we need to create a new module that parquet-hadoop will depend on with the
> `InputFile`/`OutputFile` implementation. Then we would refactor the Hadoop
> classes to extend those implementations to avoid breaking the Hadoop
> classes. We'd also need to implement the compression API directly on top of
> aircompressor in this module.
>
> On Thu, Jan 23, 2020 at 4:40 PM David Mollitor <da...@gmail.com> wrote:
>
> > I am usually a user of Parquet through Hive or Spark, but I wanted to sit
> > down and write my own small example application of using the library
> > directly.
> >
> > Is there some quick way that I can write a Parquet file to the local file
> > system using java.nio.Path (i.e., with no Hadoop dependencies?)
> >
> > Thanks!
> >
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: Writing to Local File

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
There's not currently a way to do this without Hadoop. We've been working
on moving to the `InputFile` and `OutputFile` abstractions so that we can
get rid of it, but Parquet still depends on Hadoop libraries for
compression and we haven't pulled out the parts of Parquet that use the new
abstraction from the older ones that accept Hadoop Paths, so you need to
have Hadoop in your classpath either way.

To get to where you can write a file without Hadoop dependencies, I think
we need to create a new module that parquet-hadoop will depend on with the
`InputFile`/`OutputFile` implementation. Then we would refactor the Hadoop
classes to extend those implementations to avoid breaking the Hadoop
classes. We'd also need to implement the compression API directly on top of
aircompressor in this module.

On Thu, Jan 23, 2020 at 4:40 PM David Mollitor <da...@gmail.com> wrote:

> I am usually a user of Parquet through Hive or Spark, but I wanted to sit
> down and write my own small example application of using the library
> directly.
>
> Is there some quick way that I can write a Parquet file to the local file
> system using java.nio.Path (i.e., with no Hadoop dependencies?)
>
> Thanks!
>


-- 
Ryan Blue
Software Engineer
Netflix