You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Burd, Roni" <ro...@amazon.com.INVALID> on 2020/03/02 18:27:04 UTC

Parquet vs HFile

Has anyone looked at leveraging Parquet files to replace HFiles? I recognize that HFiles may be more advanced for the hbase case, but my assumption is that Parquet can be evolved as well.

This would also help hfiles align better with a more widely adopted industry standard.

Thoughts?

RE: Parquet vs HFile

Posted by "Burd, Roni" <ro...@amazon.com.INVALID>.
I agree it is expensive, and the sole purpose would be "industry standard" - as in I can query the hflie/parquet better with any tool.

I cant help but look at what Hudi and delta lake did and feel they are doing basically the same things as abase (compaction, wal, etc), but without the stronger isolation and performance characteristics.



-----Original Message-----
From: Stack <st...@duboce.net> 
Sent: Tuesday, March 3, 2020 1:59 PM
To: HBase Dev List <de...@hbase.apache.org>
Subject: RE: [EXTERNAL]Parquet vs HFile

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



On Mon, Mar 2, 2020 at 10:27 AM Burd, Roni <ro...@amazon.com.invalid>
wrote:

> Has anyone looked at leveraging Parquet files to replace HFiles? I 
> recognize that HFiles may be more advanced for the hbase case, but my 
> assumption is that Parquet can be evolved as well.
>
> This would also help hfiles align better with a more widely adopted 
> industry standard.
>
> Thoughts?
>

I'd think the mismatch between the formats would be expensive to little benefit other than 'industry standard' unless work was done to teach hbase about columns at least as far up as the hbase 'block' as described in the 'Ressi data layout' in [1].
Thanks,
S

 1. https://dl.acm.org/doi/pdf/10.1145/3035918.3056103

Re: Parquet vs HFile

Posted by Stack <st...@duboce.net>.
On Mon, Mar 2, 2020 at 10:27 AM Burd, Roni <ro...@amazon.com.invalid>
wrote:

> Has anyone looked at leveraging Parquet files to replace HFiles? I
> recognize that HFiles may be more advanced for the hbase case, but my
> assumption is that Parquet can be evolved as well.
>
> This would also help hfiles align better with a more widely adopted
> industry standard.
>
> Thoughts?
>

I'd think the mismatch between the formats would be expensive to little
benefit other than 'industry standard' unless work was done to teach hbase
about columns at least as far up as the hbase 'block' as described in the
'Ressi data layout' in [1].
Thanks,
S

 1. https://dl.acm.org/doi/pdf/10.1145/3035918.3056103