You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Jinfeng Ni <ji...@gmail.com> on 2015/09/01 00:15:46 UTC

Source code for forked parquet library.

It seems we are using a forked parquet library. Can someone point me
to the source code for the forked parquet ?

I tried to download the source code within IDE, and it complains the
following:

"*Cannot download sources*

Sources not found for:
com.twitter:parquet-column:1.6.0rc3-drill-r0.3

"

So, looks like only the compiled code jar is published, but not the source
code jar file.

Re: Source code for forked parquet library.

Posted by Abdel Hakim Deneche <ad...@maprtech.com>.
The source code is here:

https://github.com/mapr/incubator-parquet-mr

the branch is 1.6.0rc3-drill-r0.3

Thanks

On Mon, Aug 31, 2015 at 3:15 PM, Jinfeng Ni <ji...@gmail.com> wrote:

> It seems we are using a forked parquet library. Can someone point me
> to the source code for the forked parquet ?
>
> I tried to download the source code within IDE, and it complains the
> following:
>
> "*Cannot download sources*
>
> Sources not found for:
> com.twitter:parquet-column:1.6.0rc3-drill-r0.3
>
> "
>
> So, looks like only the compiled code jar is published, but not the source
> code jar file.
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: Source code for forked parquet library.

Posted by Jacques Nadeau <ja...@dremio.com>.
If I recall, part of the initial patch for Parquet pushdown was more
focused on rowgroup pruning during planning. I believe it was based on the
old partition pruning code (could be wrong).  Furthermore, it conflicted
with the behavior of the metadata caching since the caching didn't (at the
time) require page statistics. With Steven more recent patch, I believe
stats are now recorded but I imagine a bunch of refactoring would need to
be done to complete the changes.  The other part was the filter pushdown in
the actual readers. I don't remember if there were conflicts there or not.
Definitely something that is worth getting merged.  Just wanted to provide
heads up on the potential challenges.

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Mon, Aug 31, 2015 at 9:22 PM, Jinfeng Ni <ji...@gmail.com> wrote:

> I heard that there are some issues between filter push-down and parquet
> metadata caching thing. But I'm not clear what exactly the problem is, and
> whether we have a plan to resolve that. Can you elaborate what the open
> questions
> are and the conflicts with metadata caching?
>
> The reason I'm trying to look at the filer pushdown is that one query
> posted
> in the user list couple of days ago performed really bad on Drill 1.1,
> compared with
> other system. We did some comparison analysis and thought the difference
> mainly comes from the fact that Drill lacks the parquet filter pushdown
> capability.
> At least for now, the only way for Drill to match the other system's
> performance
> is to enable filter pushdown for that query.
>
> In the meantime, we also identified some room for improvement in Drill's
> run-time
> generated code, when it is used for filter evaluation. I'll submit a patch
> for review
> shortly.
>
> Regards,
>
> Jinfeng
>
>
>
>
>
>
>
> On Mon, Aug 31, 2015 at 8:13 PM, Jacques Nadeau <ja...@dremio.com>
> wrote:
>
> > Given that Julien and Jason are working heavily on a merge into Parquet,
> I
> > strongly suggest waiting on merging other patches around that code (or at
> > least working on top of the changes they are doing.
> >
> > I thought there were a number of open questions around the filter
> pushdown
> > and how it related to the metadata caching stuff. Have those been
> resolved?
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Mon, Aug 31, 2015 at 3:25 PM, Jinfeng Ni <ji...@gmail.com>
> wrote:
> >
> > > I'm actually trying Adam's parquet filter pushdown patch (DRILL-1950).
> > > That's
> > > why I happened to click one parquet class and hit the above "source
> code
> > > not found" error.
> > >
> > > Thanks!
> > >
> > >
> > >
> > > On Mon, Aug 31, 2015 at 3:20 PM, Jason Altekruse <
> > altekrusejason@gmail.com
> > > >
> > > wrote:
> > >
> > > >
> https://github.com/mapr/incubator-parquet-mr/tree/1.6.0rc3-drill-r0.3
> > > >
> > > > I am working with Julien Le Dem on getting us off of the fork, but
> for
> > > now
> > > > the source code is accessible here. Let me know if you need any help
> > > > looking through the parquet code. Is there a particular JIRA you are
> > > trying
> > > > to address?
> > > >
> > > > On Mon, Aug 31, 2015 at 3:15 PM, Jinfeng Ni <ji...@gmail.com>
> > > wrote:
> > > >
> > > > > It seems we are using a forked parquet library. Can someone point
> me
> > > > > to the source code for the forked parquet ?
> > > > >
> > > > > I tried to download the source code within IDE, and it complains
> the
> > > > > following:
> > > > >
> > > > > "*Cannot download sources*
> > > > >
> > > > > Sources not found for:
> > > > > com.twitter:parquet-column:1.6.0rc3-drill-r0.3
> > > > >
> > > > > "
> > > > >
> > > > > So, looks like only the compiled code jar is published, but not the
> > > > source
> > > > > code jar file.
> > > > >
> > > >
> > >
> >
>

Re: Source code for forked parquet library.

Posted by Jinfeng Ni <ji...@gmail.com>.
I heard that there are some issues between filter push-down and parquet
metadata caching thing. But I'm not clear what exactly the problem is, and
whether we have a plan to resolve that. Can you elaborate what the open
questions
are and the conflicts with metadata caching?

The reason I'm trying to look at the filer pushdown is that one query
posted
in the user list couple of days ago performed really bad on Drill 1.1,
compared with
other system. We did some comparison analysis and thought the difference
mainly comes from the fact that Drill lacks the parquet filter pushdown
capability.
At least for now, the only way for Drill to match the other system's
performance
is to enable filter pushdown for that query.

In the meantime, we also identified some room for improvement in Drill's
run-time
generated code, when it is used for filter evaluation. I'll submit a patch
for review
shortly.

Regards,

Jinfeng







On Mon, Aug 31, 2015 at 8:13 PM, Jacques Nadeau <ja...@dremio.com> wrote:

> Given that Julien and Jason are working heavily on a merge into Parquet, I
> strongly suggest waiting on merging other patches around that code (or at
> least working on top of the changes they are doing.
>
> I thought there were a number of open questions around the filter pushdown
> and how it related to the metadata caching stuff. Have those been resolved?
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Mon, Aug 31, 2015 at 3:25 PM, Jinfeng Ni <ji...@gmail.com> wrote:
>
> > I'm actually trying Adam's parquet filter pushdown patch (DRILL-1950).
> > That's
> > why I happened to click one parquet class and hit the above "source code
> > not found" error.
> >
> > Thanks!
> >
> >
> >
> > On Mon, Aug 31, 2015 at 3:20 PM, Jason Altekruse <
> altekrusejason@gmail.com
> > >
> > wrote:
> >
> > > https://github.com/mapr/incubator-parquet-mr/tree/1.6.0rc3-drill-r0.3
> > >
> > > I am working with Julien Le Dem on getting us off of the fork, but for
> > now
> > > the source code is accessible here. Let me know if you need any help
> > > looking through the parquet code. Is there a particular JIRA you are
> > trying
> > > to address?
> > >
> > > On Mon, Aug 31, 2015 at 3:15 PM, Jinfeng Ni <ji...@gmail.com>
> > wrote:
> > >
> > > > It seems we are using a forked parquet library. Can someone point me
> > > > to the source code for the forked parquet ?
> > > >
> > > > I tried to download the source code within IDE, and it complains the
> > > > following:
> > > >
> > > > "*Cannot download sources*
> > > >
> > > > Sources not found for:
> > > > com.twitter:parquet-column:1.6.0rc3-drill-r0.3
> > > >
> > > > "
> > > >
> > > > So, looks like only the compiled code jar is published, but not the
> > > source
> > > > code jar file.
> > > >
> > >
> >
>

Re: Source code for forked parquet library.

Posted by Jacques Nadeau <ja...@dremio.com>.
Given that Julien and Jason are working heavily on a merge into Parquet, I
strongly suggest waiting on merging other patches around that code (or at
least working on top of the changes they are doing.

I thought there were a number of open questions around the filter pushdown
and how it related to the metadata caching stuff. Have those been resolved?

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Mon, Aug 31, 2015 at 3:25 PM, Jinfeng Ni <ji...@gmail.com> wrote:

> I'm actually trying Adam's parquet filter pushdown patch (DRILL-1950).
> That's
> why I happened to click one parquet class and hit the above "source code
> not found" error.
>
> Thanks!
>
>
>
> On Mon, Aug 31, 2015 at 3:20 PM, Jason Altekruse <altekrusejason@gmail.com
> >
> wrote:
>
> > https://github.com/mapr/incubator-parquet-mr/tree/1.6.0rc3-drill-r0.3
> >
> > I am working with Julien Le Dem on getting us off of the fork, but for
> now
> > the source code is accessible here. Let me know if you need any help
> > looking through the parquet code. Is there a particular JIRA you are
> trying
> > to address?
> >
> > On Mon, Aug 31, 2015 at 3:15 PM, Jinfeng Ni <ji...@gmail.com>
> wrote:
> >
> > > It seems we are using a forked parquet library. Can someone point me
> > > to the source code for the forked parquet ?
> > >
> > > I tried to download the source code within IDE, and it complains the
> > > following:
> > >
> > > "*Cannot download sources*
> > >
> > > Sources not found for:
> > > com.twitter:parquet-column:1.6.0rc3-drill-r0.3
> > >
> > > "
> > >
> > > So, looks like only the compiled code jar is published, but not the
> > source
> > > code jar file.
> > >
> >
>

Re: Source code for forked parquet library.

Posted by Jinfeng Ni <ji...@gmail.com>.
btw, why do not we public the source code jar in the maven nexus repo?
That way, the source code could be downloaded with just one click in IDE.



On Mon, Aug 31, 2015 at 3:25 PM, Jinfeng Ni <ji...@gmail.com> wrote:

> I'm actually trying Adam's parquet filter pushdown patch (DRILL-1950).
> That's
> why I happened to click one parquet class and hit the above "source code
> not found" error.
>
> Thanks!
>
>
>
> On Mon, Aug 31, 2015 at 3:20 PM, Jason Altekruse <altekrusejason@gmail.com
> > wrote:
>
>> https://github.com/mapr/incubator-parquet-mr/tree/1.6.0rc3-drill-r0.3
>>
>> I am working with Julien Le Dem on getting us off of the fork, but for now
>> the source code is accessible here. Let me know if you need any help
>> looking through the parquet code. Is there a particular JIRA you are
>> trying
>> to address?
>>
>> On Mon, Aug 31, 2015 at 3:15 PM, Jinfeng Ni <ji...@gmail.com>
>> wrote:
>>
>> > It seems we are using a forked parquet library. Can someone point me
>> > to the source code for the forked parquet ?
>> >
>> > I tried to download the source code within IDE, and it complains the
>> > following:
>> >
>> > "*Cannot download sources*
>> >
>> > Sources not found for:
>> > com.twitter:parquet-column:1.6.0rc3-drill-r0.3
>> >
>> > "
>> >
>> > So, looks like only the compiled code jar is published, but not the
>> source
>> > code jar file.
>> >
>>
>
>

Re: Source code for forked parquet library.

Posted by Jinfeng Ni <ji...@gmail.com>.
I'm actually trying Adam's parquet filter pushdown patch (DRILL-1950).
That's
why I happened to click one parquet class and hit the above "source code
not found" error.

Thanks!



On Mon, Aug 31, 2015 at 3:20 PM, Jason Altekruse <al...@gmail.com>
wrote:

> https://github.com/mapr/incubator-parquet-mr/tree/1.6.0rc3-drill-r0.3
>
> I am working with Julien Le Dem on getting us off of the fork, but for now
> the source code is accessible here. Let me know if you need any help
> looking through the parquet code. Is there a particular JIRA you are trying
> to address?
>
> On Mon, Aug 31, 2015 at 3:15 PM, Jinfeng Ni <ji...@gmail.com> wrote:
>
> > It seems we are using a forked parquet library. Can someone point me
> > to the source code for the forked parquet ?
> >
> > I tried to download the source code within IDE, and it complains the
> > following:
> >
> > "*Cannot download sources*
> >
> > Sources not found for:
> > com.twitter:parquet-column:1.6.0rc3-drill-r0.3
> >
> > "
> >
> > So, looks like only the compiled code jar is published, but not the
> source
> > code jar file.
> >
>

Re: Source code for forked parquet library.

Posted by Jason Altekruse <al...@gmail.com>.
https://github.com/mapr/incubator-parquet-mr/tree/1.6.0rc3-drill-r0.3

I am working with Julien Le Dem on getting us off of the fork, but for now
the source code is accessible here. Let me know if you need any help
looking through the parquet code. Is there a particular JIRA you are trying
to address?

On Mon, Aug 31, 2015 at 3:15 PM, Jinfeng Ni <ji...@gmail.com> wrote:

> It seems we are using a forked parquet library. Can someone point me
> to the source code for the forked parquet ?
>
> I tried to download the source code within IDE, and it complains the
> following:
>
> "*Cannot download sources*
>
> Sources not found for:
> com.twitter:parquet-column:1.6.0rc3-drill-r0.3
>
> "
>
> So, looks like only the compiled code jar is published, but not the source
> code jar file.
>