You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Lekshmi Narayanan, Arun Balajiee" <AR...@pitt.edu> on 2020/05/13 23:28:25 UTC

Re: Parquet - 41

Hi

Just wanted to re-confirm that I am working on the C++ implementation of Bloom Filters in Arrow. I don't have the access level to complete this. Could you assign me to this ticket?

https://issues.apache.org/jira/browse/PARQUET-1327
username: encodedgeek

Regards
________________________________
From: Lekshmi Narayanan, Arun Balajiee <AR...@pitt.edu>
Sent: 21 April 2020 06:57
To: dev@parquet.apache.org <de...@parquet.apache.org>
Subject: Re: Parquet - 41

Yes. I would like to contribute to bloom filters in Arrow

I also wanted to check, would it be a good idea to add Bloom filters in Column Indices ( PARQUET-1404<https://issues.apache.org/jira/projects/PARQUET/issues/PARQUET-1404?filter=allopenissues> )

Regards
Arun Balajiee

________________________________
From: Junjie Chen <ch...@gmail.com>
Sent: 20 April 2020 22:20
To: dev@parquet.apache.org <de...@parquet.apache.org>
Subject: Re: Parquet - 41

As far as I know, not implemented yet. The thrift is update-to-date now,
would you like to contribute?

Things we need are:
1. xxhash c++ implementation
2. reader and writer for the bloom filter
3. filtering logic for row group

Implementing the reader would be a good start.

On Tue, Apr 21, 2020 at 8:52 AM <AR...@pitt.edu> wrote:

> Hi
>
> Is the  C++ version of bloom filter implemented in Arrow Parquet C++?
>
> https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FPARQUET-41&amp;data=02%7C01%7CARL122%40pitt.edu%7C077d6ee2886a4fa6aa9908d7e59b839a%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637230328401496549&amp;sdata=hCOC43WB5QLrk3nbM19kp%2BrSrllsrI3LuCUF6oiIYu4%3D&amp;reserved=0
> [PARQUET-41] Add bloom filters to parquet statistics - ASF JIRA<
> https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FPARQUET-41&amp;data=02%7C01%7CARL122%40pitt.edu%7C077d6ee2886a4fa6aa9908d7e59b839a%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637230328401496549&amp;sdata=hCOC43WB5QLrk3nbM19kp%2BrSrllsrI3LuCUF6oiIYu4%3D&amp;reserved=0>
> For row groups with no dictionary, we could still produce a bloom filter.
> This could be very useful in filtering entire row groups. Pull request:
> https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2F&amp;data=02%7C01%7CARL122%40pitt.edu%7C077d6ee2886a4fa6aa9908d7e59b839a%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637230328401496549&amp;sdata=9XFJB4y9X%2FpeWAqpO%2BQdJnHM6oXYRU37lZ0XhodRlxc%3D&amp;reserved=0 ...
> issues.apache.org
> Regards
>


--
Best Regards

Re: Parquet - 41

Posted by "Lekshmi Narayanan, Arun Balajiee" <AR...@pitt.edu>.
Hi

I understand the concern. Please don't mistake my approach to the problem. I have made progress on the past issues as well, it's not just in the presentable form that could be appreciated by the community without those changes.

Please be assured that I will make sure to push these changes in separate patches ( I will extreme care towards that) so that there won't be a large patch at the end in one bulky PR. each of the changes will be part of their own separate issues and nothing will overlap.  While my interest is to contribute to arrow, I also have to make some of these changes locally on my machine to experiment and get some results for my coursework.

I hope it is understandable

Regards
________________________________
From: Wes McKinney <we...@gmail.com>
Sent: 14 May 2020 09:44
To: Parquet Dev <de...@parquet.apache.org>
Subject: Re: Parquet - 41

OK -- with the comments about Bloom filters it makes me concerned that the
scope of what you are working on for PARQUET-1404 is expanding without the
past issues being resolved, so we could end up with a large patch that may
not be able to be merged without a lot of additional work. It would be best
to break up the work into smaller patches if possible and work to get them
merged into the project.

On Wed, May 13, 2020, 10:43 PM Lekshmi Narayanan, Arun Balajiee <
ARL122@pitt.edu> wrote:

> Firstly, thanks for adding me.
>
> Yes I want to do this in relation with PARQUET - 1404. I  completed read
> and write index api, but the at moment to make it approvable with a PR I
> have to remove all the other file changes and address your comments as
> well. I can come back those, when I complete my thesis defense at my
> school, if that is okay
>
> Regards
> Arun Balajiee
>
>
> Regards,
>
> Arun Balajiee
>
> ________________________________
> From: Wes McKinney <we...@gmail.com>
> Sent: Wednesday, May 13, 2020 10:27:29 PM
> To: Parquet Dev <de...@parquet.apache.org>
> Subject: Re: Parquet - 41
>
> I just added Arun as a contributor.
>
> @Arun -- are you planning to do this in relation to PARQUET-1404?
> Where does that project stand?
>
> On Wed, May 13, 2020 at 9:22 PM Junjie Chen <ch...@gmail.com>
> wrote:
> >
> > You need a committer to add you as a contributor to the project. I'm not
> a
> > committer yet...  @Gabor, could you please help to assign this?
> >
> > On Thu, May 14, 2020 at 7:28 AM Lekshmi Narayanan, Arun Balajiee <
> > ARL122@pitt.edu> wrote:
> >
> > > Hi
> > >
> > > Just wanted to re-confirm that I am working on the C++ implementation
> of
> > > Bloom Filters in Arrow. I don't have the access level to complete this.
> > > Could you assign me to this ticket?
> > >
> > >
> https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FPARQUET-1327&amp;data=02%7C01%7CARL122%40pitt.edu%7C1902af16601a4d5cb74208d7f80cf127%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C1%7C637250606783028474&amp;sdata=rAccBVv6Sg1ZQcHjgTyGuaaxkPhKWcw8OyGmTlgTNB8%3D&amp;reserved=0
> > > username: encodedgeek
> > >
> > > Regards
> > > ________________________________
> > > From: Lekshmi Narayanan, Arun Balajiee <AR...@pitt.edu>
> > > Sent: 21 April 2020 06:57
> > > To: dev@parquet.apache.org <de...@parquet.apache.org>
> > > Subject: Re: Parquet - 41
> > >
> > > Yes. I would like to contribute to bloom filters in Arrow
> > >
> > > I also wanted to check, would it be a good idea to add Bloom filters in
> > > Column Indices ( PARQUET-1404<
> > >
> https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fprojects%2FPARQUET%2Fissues%2FPARQUET-1404%3Ffilter%3Dallopenissues&amp;data=02%7C01%7CARL122%40pitt.edu%7C1902af16601a4d5cb74208d7f80cf127%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C1%7C637250606783028474&amp;sdata=oQmQjPMA2%2Fu%2FUm%2F%2B9pRGRV0fdnSKsjFz07gq1FMzPa8%3D&amp;reserved=0
> >
> > > )
> > >
> > > Regards
> > > Arun Balajiee
> > >
> > > ________________________________
> > > From: Junjie Chen <ch...@gmail.com>
> > > Sent: 20 April 2020 22:20
> > > To: dev@parquet.apache.org <de...@parquet.apache.org>
> > > Subject: Re: Parquet - 41
> > >
> > > As far as I know, not implemented yet. The thrift is update-to-date
> now,
> > > would you like to contribute?
> > >
> > > Things we need are:
> > > 1. xxhash c++ implementation
> > > 2. reader and writer for the bloom filter
> > > 3. filtering logic for row group
> > >
> > > Implementing the reader would be a good start.
> > >
> > > On Tue, Apr 21, 2020 at 8:52 AM <AR...@pitt.edu> wrote:
> > >
> > > > Hi
> > > >
> > > > Is the  C++ version of bloom filter implemented in Arrow Parquet C++?
> > > >
> > > >
> > >
> https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FPARQUET-41&amp;data=02%7C01%7CARL122%40pitt.edu%7C1902af16601a4d5cb74208d7f80cf127%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C1%7C637250606783028474&amp;sdata=1k0kdnoaRwh2kkV2%2BbfAaThuT4Ld4gW2wuWyjgy1tCY%3D&amp;reserved=0
> > > > [PARQUET-41] Add bloom filters to parquet statistics - ASF JIRA<
> > > >
> > >
> https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FPARQUET-41&amp;data=02%7C01%7CARL122%40pitt.edu%7C1902af16601a4d5cb74208d7f80cf127%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C1%7C637250606783038434&amp;sdata=PzDnEbC5qZYNSgBlBisI%2BycFR%2BBNxUFp7uEbMPhd%2FuU%3D&amp;reserved=0
> > > >
> > > > For row groups with no dictionary, we could still produce a bloom
> filter.
> > > > This could be very useful in filtering entire row groups. Pull
> request:
> > > >
> > >
> https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2F&amp;data=02%7C01%7CARL122%40pitt.edu%7C1902af16601a4d5cb74208d7f80cf127%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C1%7C637250606783038434&amp;sdata=TNTESUJLUysrq%2FkZt8z9Eb%2FCbd0Ne8MN2jekXasLT6w%3D&amp;reserved=0
> > > ...
> > > > issues.apache.org
> > > > Regards
> > > >
> > >
> > >
> > > --
> > > Best Regards
> > >
> >
> >
> > --
> > Best Regards
>

Re: Parquet - 41

Posted by Wes McKinney <we...@gmail.com>.
OK -- with the comments about Bloom filters it makes me concerned that the
scope of what you are working on for PARQUET-1404 is expanding without the
past issues being resolved, so we could end up with a large patch that may
not be able to be merged without a lot of additional work. It would be best
to break up the work into smaller patches if possible and work to get them
merged into the project.

On Wed, May 13, 2020, 10:43 PM Lekshmi Narayanan, Arun Balajiee <
ARL122@pitt.edu> wrote:

> Firstly, thanks for adding me.
>
> Yes I want to do this in relation with PARQUET - 1404. I  completed read
> and write index api, but the at moment to make it approvable with a PR I
> have to remove all the other file changes and address your comments as
> well. I can come back those, when I complete my thesis defense at my
> school, if that is okay
>
> Regards
> Arun Balajiee
>
>
> Regards,
>
> Arun Balajiee
>
> ________________________________
> From: Wes McKinney <we...@gmail.com>
> Sent: Wednesday, May 13, 2020 10:27:29 PM
> To: Parquet Dev <de...@parquet.apache.org>
> Subject: Re: Parquet - 41
>
> I just added Arun as a contributor.
>
> @Arun -- are you planning to do this in relation to PARQUET-1404?
> Where does that project stand?
>
> On Wed, May 13, 2020 at 9:22 PM Junjie Chen <ch...@gmail.com>
> wrote:
> >
> > You need a committer to add you as a contributor to the project. I'm not
> a
> > committer yet...  @Gabor, could you please help to assign this?
> >
> > On Thu, May 14, 2020 at 7:28 AM Lekshmi Narayanan, Arun Balajiee <
> > ARL122@pitt.edu> wrote:
> >
> > > Hi
> > >
> > > Just wanted to re-confirm that I am working on the C++ implementation
> of
> > > Bloom Filters in Arrow. I don't have the access level to complete this.
> > > Could you assign me to this ticket?
> > >
> > >
> https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FPARQUET-1327&amp;data=02%7C01%7CARL122%40pitt.edu%7C1fbe5a68e3024b27d17408d7f7ae72c9%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637250200931681689&amp;sdata=DAhCLYR9ggvioaSLXrQLXDoF7tEbUo9rMSlq7nvcuHw%3D&amp;reserved=0
> > > username: encodedgeek
> > >
> > > Regards
> > > ________________________________
> > > From: Lekshmi Narayanan, Arun Balajiee <AR...@pitt.edu>
> > > Sent: 21 April 2020 06:57
> > > To: dev@parquet.apache.org <de...@parquet.apache.org>
> > > Subject: Re: Parquet - 41
> > >
> > > Yes. I would like to contribute to bloom filters in Arrow
> > >
> > > I also wanted to check, would it be a good idea to add Bloom filters in
> > > Column Indices ( PARQUET-1404<
> > >
> https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fprojects%2FPARQUET%2Fissues%2FPARQUET-1404%3Ffilter%3Dallopenissues&amp;data=02%7C01%7CARL122%40pitt.edu%7C1fbe5a68e3024b27d17408d7f7ae72c9%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637250200931681689&amp;sdata=BsZ7HqzJAKqTIJiur07qbBEbDN1LgrI3MC81nULXneE%3D&amp;reserved=0
> >
> > > )
> > >
> > > Regards
> > > Arun Balajiee
> > >
> > > ________________________________
> > > From: Junjie Chen <ch...@gmail.com>
> > > Sent: 20 April 2020 22:20
> > > To: dev@parquet.apache.org <de...@parquet.apache.org>
> > > Subject: Re: Parquet - 41
> > >
> > > As far as I know, not implemented yet. The thrift is update-to-date
> now,
> > > would you like to contribute?
> > >
> > > Things we need are:
> > > 1. xxhash c++ implementation
> > > 2. reader and writer for the bloom filter
> > > 3. filtering logic for row group
> > >
> > > Implementing the reader would be a good start.
> > >
> > > On Tue, Apr 21, 2020 at 8:52 AM <AR...@pitt.edu> wrote:
> > >
> > > > Hi
> > > >
> > > > Is the  C++ version of bloom filter implemented in Arrow Parquet C++?
> > > >
> > > >
> > >
> https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FPARQUET-41&amp;data=02%7C01%7CARL122%40pitt.edu%7C1fbe5a68e3024b27d17408d7f7ae72c9%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637250200931681689&amp;sdata=77V7VEoLrkZvTtqEJQO2NwRZrLfHmexBcwzWcKl%2Fcfw%3D&amp;reserved=0
> > > > [PARQUET-41] Add bloom filters to parquet statistics - ASF JIRA<
> > > >
> > >
> https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FPARQUET-41&amp;data=02%7C01%7CARL122%40pitt.edu%7C1fbe5a68e3024b27d17408d7f7ae72c9%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637250200931691684&amp;sdata=IZi%2F2rkwXJjMuDQWoSzWDEeYpozgS%2BzQKjxh%2BXGXqG4%3D&amp;reserved=0
> > > >
> > > > For row groups with no dictionary, we could still produce a bloom
> filter.
> > > > This could be very useful in filtering entire row groups. Pull
> request:
> > > >
> > >
> https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2F&amp;data=02%7C01%7CARL122%40pitt.edu%7C1fbe5a68e3024b27d17408d7f7ae72c9%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637250200931691684&amp;sdata=vPhj2G%2B3IJCjEDoR3vMpgXT%2Brm8rF2eXlKUxw8pGKb0%3D&amp;reserved=0
> > > ...
> > > > issues.apache.org
> > > > Regards
> > > >
> > >
> > >
> > > --
> > > Best Regards
> > >
> >
> >
> > --
> > Best Regards
>

Re: Parquet - 41

Posted by "Lekshmi Narayanan, Arun Balajiee" <AR...@pitt.edu>.
To add one more point to that,

I am investigating with respect to my changes in PARQUET-1404, if introducing the Bloom filter at column level and at page level has any differences. For this reason, I am starting work on PARQUET-41 to first be able understand how to read/write bloom filters and then implement it at column level and page level (inside the page indices)

Regards
________________________________
From: Lekshmi Narayanan, Arun Balajiee <AR...@pitt.edu>
Sent: 13 May 2020 23:43
To: dev@parquet.apache.org <de...@parquet.apache.org>
Subject: Re: Parquet - 41

Firstly, thanks for adding me.

Yes I want to do this in relation with PARQUET - 1404. I  completed read and write index api, but the at moment to make it approvable with a PR I have to remove all the other file changes and address your comments as well. I can come back those, when I complete my thesis defense at my school, if that is okay

Regards
Arun Balajiee


Regards,

Arun Balajiee

________________________________
From: Wes McKinney <we...@gmail.com>
Sent: Wednesday, May 13, 2020 10:27:29 PM
To: Parquet Dev <de...@parquet.apache.org>
Subject: Re: Parquet - 41

I just added Arun as a contributor.

@Arun -- are you planning to do this in relation to PARQUET-1404?
Where does that project stand?

On Wed, May 13, 2020 at 9:22 PM Junjie Chen <ch...@gmail.com> wrote:
>
> You need a committer to add you as a contributor to the project. I'm not a
> committer yet...  @Gabor, could you please help to assign this?
>
> On Thu, May 14, 2020 at 7:28 AM Lekshmi Narayanan, Arun Balajiee <
> ARL122@pitt.edu> wrote:
>
> > Hi
> >
> > Just wanted to re-confirm that I am working on the C++ implementation of
> > Bloom Filters in Arrow. I don't have the access level to complete this.
> > Could you assign me to this ticket?
> >
> > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FPARQUET-1327&amp;data=02%7C01%7CARL122%40pitt.edu%7C5cf591e0ec284149d3d108d7f7b8f8fa%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637250246135269546&amp;sdata=Wb9RoqyBo3W5dt3s09v26SIAcVhndK0uFfmGJUiP5cM%3D&amp;reserved=0
> > username: encodedgeek
> >
> > Regards
> > ________________________________
> > From: Lekshmi Narayanan, Arun Balajiee <AR...@pitt.edu>
> > Sent: 21 April 2020 06:57
> > To: dev@parquet.apache.org <de...@parquet.apache.org>
> > Subject: Re: Parquet - 41
> >
> > Yes. I would like to contribute to bloom filters in Arrow
> >
> > I also wanted to check, would it be a good idea to add Bloom filters in
> > Column Indices ( PARQUET-1404<
> > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fprojects%2FPARQUET%2Fissues%2FPARQUET-1404%3Ffilter%3Dallopenissues&amp;data=02%7C01%7CARL122%40pitt.edu%7C5cf591e0ec284149d3d108d7f7b8f8fa%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637250246135269546&amp;sdata=DIHm4qHOjmna78sbwrj%2Bgf%2Fkm3rX0agXjqv%2F%2BlSqJ6g%3D&amp;reserved=0>
> > )
> >
> > Regards
> > Arun Balajiee
> >
> > ________________________________
> > From: Junjie Chen <ch...@gmail.com>
> > Sent: 20 April 2020 22:20
> > To: dev@parquet.apache.org <de...@parquet.apache.org>
> > Subject: Re: Parquet - 41
> >
> > As far as I know, not implemented yet. The thrift is update-to-date now,
> > would you like to contribute?
> >
> > Things we need are:
> > 1. xxhash c++ implementation
> > 2. reader and writer for the bloom filter
> > 3. filtering logic for row group
> >
> > Implementing the reader would be a good start.
> >
> > On Tue, Apr 21, 2020 at 8:52 AM <AR...@pitt.edu> wrote:
> >
> > > Hi
> > >
> > > Is the  C++ version of bloom filter implemented in Arrow Parquet C++?
> > >
> > >
> > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FPARQUET-41&amp;data=02%7C01%7CARL122%40pitt.edu%7C5cf591e0ec284149d3d108d7f7b8f8fa%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637250246135269546&amp;sdata=vCDoU3MarX72LnW2VxMTgRM177ZhbQUtqT7N9n3yNO4%3D&amp;reserved=0
> > > [PARQUET-41] Add bloom filters to parquet statistics - ASF JIRA<
> > >
> > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FPARQUET-41&amp;data=02%7C01%7CARL122%40pitt.edu%7C5cf591e0ec284149d3d108d7f7b8f8fa%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637250246135269546&amp;sdata=vCDoU3MarX72LnW2VxMTgRM177ZhbQUtqT7N9n3yNO4%3D&amp;reserved=0
> > >
> > > For row groups with no dictionary, we could still produce a bloom filter.
> > > This could be very useful in filtering entire row groups. Pull request:
> > >
> > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2F&amp;data=02%7C01%7CARL122%40pitt.edu%7C5cf591e0ec284149d3d108d7f7b8f8fa%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637250246135269546&amp;sdata=9ozPEE5y0xRCz9gbSHG8AKw0YrfawTp4H6AjxEDvh08%3D&amp;reserved=0
> > ...
> > > issues.apache.org
> > > Regards
> > >
> >
> >
> > --
> > Best Regards
> >
>
>
> --
> Best Regards

Re: Parquet - 41

Posted by "Lekshmi Narayanan, Arun Balajiee" <AR...@pitt.edu>.
Firstly, thanks for adding me.

Yes I want to do this in relation with PARQUET - 1404. I  completed read and write index api, but the at moment to make it approvable with a PR I have to remove all the other file changes and address your comments as well. I can come back those, when I complete my thesis defense at my school, if that is okay

Regards
Arun Balajiee


Regards,

Arun Balajiee

________________________________
From: Wes McKinney <we...@gmail.com>
Sent: Wednesday, May 13, 2020 10:27:29 PM
To: Parquet Dev <de...@parquet.apache.org>
Subject: Re: Parquet - 41

I just added Arun as a contributor.

@Arun -- are you planning to do this in relation to PARQUET-1404?
Where does that project stand?

On Wed, May 13, 2020 at 9:22 PM Junjie Chen <ch...@gmail.com> wrote:
>
> You need a committer to add you as a contributor to the project. I'm not a
> committer yet...  @Gabor, could you please help to assign this?
>
> On Thu, May 14, 2020 at 7:28 AM Lekshmi Narayanan, Arun Balajiee <
> ARL122@pitt.edu> wrote:
>
> > Hi
> >
> > Just wanted to re-confirm that I am working on the C++ implementation of
> > Bloom Filters in Arrow. I don't have the access level to complete this.
> > Could you assign me to this ticket?
> >
> > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FPARQUET-1327&amp;data=02%7C01%7CARL122%40pitt.edu%7C1fbe5a68e3024b27d17408d7f7ae72c9%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637250200931681689&amp;sdata=DAhCLYR9ggvioaSLXrQLXDoF7tEbUo9rMSlq7nvcuHw%3D&amp;reserved=0
> > username: encodedgeek
> >
> > Regards
> > ________________________________
> > From: Lekshmi Narayanan, Arun Balajiee <AR...@pitt.edu>
> > Sent: 21 April 2020 06:57
> > To: dev@parquet.apache.org <de...@parquet.apache.org>
> > Subject: Re: Parquet - 41
> >
> > Yes. I would like to contribute to bloom filters in Arrow
> >
> > I also wanted to check, would it be a good idea to add Bloom filters in
> > Column Indices ( PARQUET-1404<
> > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fprojects%2FPARQUET%2Fissues%2FPARQUET-1404%3Ffilter%3Dallopenissues&amp;data=02%7C01%7CARL122%40pitt.edu%7C1fbe5a68e3024b27d17408d7f7ae72c9%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637250200931681689&amp;sdata=BsZ7HqzJAKqTIJiur07qbBEbDN1LgrI3MC81nULXneE%3D&amp;reserved=0>
> > )
> >
> > Regards
> > Arun Balajiee
> >
> > ________________________________
> > From: Junjie Chen <ch...@gmail.com>
> > Sent: 20 April 2020 22:20
> > To: dev@parquet.apache.org <de...@parquet.apache.org>
> > Subject: Re: Parquet - 41
> >
> > As far as I know, not implemented yet. The thrift is update-to-date now,
> > would you like to contribute?
> >
> > Things we need are:
> > 1. xxhash c++ implementation
> > 2. reader and writer for the bloom filter
> > 3. filtering logic for row group
> >
> > Implementing the reader would be a good start.
> >
> > On Tue, Apr 21, 2020 at 8:52 AM <AR...@pitt.edu> wrote:
> >
> > > Hi
> > >
> > > Is the  C++ version of bloom filter implemented in Arrow Parquet C++?
> > >
> > >
> > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FPARQUET-41&amp;data=02%7C01%7CARL122%40pitt.edu%7C1fbe5a68e3024b27d17408d7f7ae72c9%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637250200931681689&amp;sdata=77V7VEoLrkZvTtqEJQO2NwRZrLfHmexBcwzWcKl%2Fcfw%3D&amp;reserved=0
> > > [PARQUET-41] Add bloom filters to parquet statistics - ASF JIRA<
> > >
> > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FPARQUET-41&amp;data=02%7C01%7CARL122%40pitt.edu%7C1fbe5a68e3024b27d17408d7f7ae72c9%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637250200931691684&amp;sdata=IZi%2F2rkwXJjMuDQWoSzWDEeYpozgS%2BzQKjxh%2BXGXqG4%3D&amp;reserved=0
> > >
> > > For row groups with no dictionary, we could still produce a bloom filter.
> > > This could be very useful in filtering entire row groups. Pull request:
> > >
> > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2F&amp;data=02%7C01%7CARL122%40pitt.edu%7C1fbe5a68e3024b27d17408d7f7ae72c9%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637250200931691684&amp;sdata=vPhj2G%2B3IJCjEDoR3vMpgXT%2Brm8rF2eXlKUxw8pGKb0%3D&amp;reserved=0
> > ...
> > > issues.apache.org
> > > Regards
> > >
> >
> >
> > --
> > Best Regards
> >
>
>
> --
> Best Regards

Re: Parquet - 41

Posted by Wes McKinney <we...@gmail.com>.
I just added Arun as a contributor.

@Arun -- are you planning to do this in relation to PARQUET-1404?
Where does that project stand?

On Wed, May 13, 2020 at 9:22 PM Junjie Chen <ch...@gmail.com> wrote:
>
> You need a committer to add you as a contributor to the project. I'm not a
> committer yet...  @Gabor, could you please help to assign this?
>
> On Thu, May 14, 2020 at 7:28 AM Lekshmi Narayanan, Arun Balajiee <
> ARL122@pitt.edu> wrote:
>
> > Hi
> >
> > Just wanted to re-confirm that I am working on the C++ implementation of
> > Bloom Filters in Arrow. I don't have the access level to complete this.
> > Could you assign me to this ticket?
> >
> > https://issues.apache.org/jira/browse/PARQUET-1327
> > username: encodedgeek
> >
> > Regards
> > ________________________________
> > From: Lekshmi Narayanan, Arun Balajiee <AR...@pitt.edu>
> > Sent: 21 April 2020 06:57
> > To: dev@parquet.apache.org <de...@parquet.apache.org>
> > Subject: Re: Parquet - 41
> >
> > Yes. I would like to contribute to bloom filters in Arrow
> >
> > I also wanted to check, would it be a good idea to add Bloom filters in
> > Column Indices ( PARQUET-1404<
> > https://issues.apache.org/jira/projects/PARQUET/issues/PARQUET-1404?filter=allopenissues>
> > )
> >
> > Regards
> > Arun Balajiee
> >
> > ________________________________
> > From: Junjie Chen <ch...@gmail.com>
> > Sent: 20 April 2020 22:20
> > To: dev@parquet.apache.org <de...@parquet.apache.org>
> > Subject: Re: Parquet - 41
> >
> > As far as I know, not implemented yet. The thrift is update-to-date now,
> > would you like to contribute?
> >
> > Things we need are:
> > 1. xxhash c++ implementation
> > 2. reader and writer for the bloom filter
> > 3. filtering logic for row group
> >
> > Implementing the reader would be a good start.
> >
> > On Tue, Apr 21, 2020 at 8:52 AM <AR...@pitt.edu> wrote:
> >
> > > Hi
> > >
> > > Is the  C++ version of bloom filter implemented in Arrow Parquet C++?
> > >
> > >
> > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FPARQUET-41&amp;data=02%7C01%7CARL122%40pitt.edu%7C077d6ee2886a4fa6aa9908d7e59b839a%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637230328401496549&amp;sdata=hCOC43WB5QLrk3nbM19kp%2BrSrllsrI3LuCUF6oiIYu4%3D&amp;reserved=0
> > > [PARQUET-41] Add bloom filters to parquet statistics - ASF JIRA<
> > >
> > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FPARQUET-41&amp;data=02%7C01%7CARL122%40pitt.edu%7C077d6ee2886a4fa6aa9908d7e59b839a%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637230328401496549&amp;sdata=hCOC43WB5QLrk3nbM19kp%2BrSrllsrI3LuCUF6oiIYu4%3D&amp;reserved=0
> > >
> > > For row groups with no dictionary, we could still produce a bloom filter.
> > > This could be very useful in filtering entire row groups. Pull request:
> > >
> > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2F&amp;data=02%7C01%7CARL122%40pitt.edu%7C077d6ee2886a4fa6aa9908d7e59b839a%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637230328401496549&amp;sdata=9XFJB4y9X%2FpeWAqpO%2BQdJnHM6oXYRU37lZ0XhodRlxc%3D&amp;reserved=0
> > ...
> > > issues.apache.org
> > > Regards
> > >
> >
> >
> > --
> > Best Regards
> >
>
>
> --
> Best Regards

Re: Parquet - 41

Posted by Junjie Chen <ch...@gmail.com>.
You need a committer to add you as a contributor to the project. I'm not a
committer yet...  @Gabor, could you please help to assign this?

On Thu, May 14, 2020 at 7:28 AM Lekshmi Narayanan, Arun Balajiee <
ARL122@pitt.edu> wrote:

> Hi
>
> Just wanted to re-confirm that I am working on the C++ implementation of
> Bloom Filters in Arrow. I don't have the access level to complete this.
> Could you assign me to this ticket?
>
> https://issues.apache.org/jira/browse/PARQUET-1327
> username: encodedgeek
>
> Regards
> ________________________________
> From: Lekshmi Narayanan, Arun Balajiee <AR...@pitt.edu>
> Sent: 21 April 2020 06:57
> To: dev@parquet.apache.org <de...@parquet.apache.org>
> Subject: Re: Parquet - 41
>
> Yes. I would like to contribute to bloom filters in Arrow
>
> I also wanted to check, would it be a good idea to add Bloom filters in
> Column Indices ( PARQUET-1404<
> https://issues.apache.org/jira/projects/PARQUET/issues/PARQUET-1404?filter=allopenissues>
> )
>
> Regards
> Arun Balajiee
>
> ________________________________
> From: Junjie Chen <ch...@gmail.com>
> Sent: 20 April 2020 22:20
> To: dev@parquet.apache.org <de...@parquet.apache.org>
> Subject: Re: Parquet - 41
>
> As far as I know, not implemented yet. The thrift is update-to-date now,
> would you like to contribute?
>
> Things we need are:
> 1. xxhash c++ implementation
> 2. reader and writer for the bloom filter
> 3. filtering logic for row group
>
> Implementing the reader would be a good start.
>
> On Tue, Apr 21, 2020 at 8:52 AM <AR...@pitt.edu> wrote:
>
> > Hi
> >
> > Is the  C++ version of bloom filter implemented in Arrow Parquet C++?
> >
> >
> https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FPARQUET-41&amp;data=02%7C01%7CARL122%40pitt.edu%7C077d6ee2886a4fa6aa9908d7e59b839a%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637230328401496549&amp;sdata=hCOC43WB5QLrk3nbM19kp%2BrSrllsrI3LuCUF6oiIYu4%3D&amp;reserved=0
> > [PARQUET-41] Add bloom filters to parquet statistics - ASF JIRA<
> >
> https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FPARQUET-41&amp;data=02%7C01%7CARL122%40pitt.edu%7C077d6ee2886a4fa6aa9908d7e59b839a%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637230328401496549&amp;sdata=hCOC43WB5QLrk3nbM19kp%2BrSrllsrI3LuCUF6oiIYu4%3D&amp;reserved=0
> >
> > For row groups with no dictionary, we could still produce a bloom filter.
> > This could be very useful in filtering entire row groups. Pull request:
> >
> https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2F&amp;data=02%7C01%7CARL122%40pitt.edu%7C077d6ee2886a4fa6aa9908d7e59b839a%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637230328401496549&amp;sdata=9XFJB4y9X%2FpeWAqpO%2BQdJnHM6oXYRU37lZ0XhodRlxc%3D&amp;reserved=0
> ...
> > issues.apache.org
> > Regards
> >
>
>
> --
> Best Regards
>


-- 
Best Regards