You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Xinli shang <sh...@uber.com.INVALID> on 2020/09/22 17:49:14 UTC
Parquet sync meeting notes - 9/22/2020
9/22/2020
Hi all,
Attendees: Ashish Singh, Julien, Gidon, Gabor, Xinli
1.
Column Encryption
1.
PRs are all merged.
2.
Data masking
1.
This feature should have top-down approach starting from service.
Maybe we can start with parquet-tool.
2.
Reducing the storage overhead for data masking - There are some
thinkings about different compressions and tools.
3.
Testing: Write more tests and test with real data usually are good
approaches. Maybe we can ask some 3rd parties that don’t know
this feature
to have test by looking at the specification. Another way is to have
interoperability tests between C++ and Java implementation.
Gidon has done
some in his work. It would be great if we have automation for
it to test
more.
2.
Parquet 12 release
1.
Created Jira ticket to release.
2.
Concern for the adoption of Parquet 1.11.0 because of Avro
version. Action
Item(Owner: Xinli) - Bring in people dealing with this upgrade effort
to the next meeting to discuss the next step.
3.
Parquet-313: Implement 3 level ist.
1.
This is a good initiative. It is a little hard for current committers
to review because we don’t use that or don’t have knowledge of
that. Check
if somebody from Twitter can help with the review & verification. Action
Item(Owner: Julie) Julie will try to find someone.
2.
Use a flag in the code to make it optional to reduce the risk.
Please let me know if you have any questions.
Xinli Shang | Tech Lead Manager @ Uber Data Infra
--
Xinli Shang