You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Chin Wei Low <lo...@gmail.com> on 2015/04/20 04:14:54 UTC

First read of parquet file is slow

Hi,

I am reading the meta data of a few parquet files in local file system. It
takes a long time to read the first file and subsequent read of other files
are fast. All files are about the same size and the reading time ratio is
about 10:1.

May I know why this can happen?

Regards,
Chin Wei

Re: First read of parquet file is slow

Posted by Ryan Blue <bl...@cloudera.com>.
On 04/19/2015 07:14 PM, Chin Wei Low wrote:
> Hi,
>
> I am reading the meta data of a few parquet files in local file system. It
> takes a long time to read the first file and subsequent read of other files
> are fast. All files are about the same size and the reading time ratio is
> about 10:1.
>
> May I know why this can happen?
>
> Regards,
> Chin Wei
>

I have no idea. My best guess is that the library is structured to avoid 
branching where possible, so there are a lot of objects that provide a 
single method overriding a superclass with multiple specific methods, 
like addInt, addLong, etc. All of those extra method calls can 
eventually be inlined and optimized, but that might take some time for 
the JVM to figure out.

You could try running with JVM flags to change how aggressive the 
optimizer is and see if that changes the performance profile. That would 
have an effect if this is actually being caused by the optimization and 
JIT thresholds in the JVM.

rb

-- 
Ryan Blue
Software Engineer
Cloudera, Inc.