You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Uwe Korn <uw...@xhochy.com> on 2016/09/07 12:30:20 UTC

Cannot load Parquet files created with parquet-cpp in Drill

Hello,

I'm currently looking at the correctness of our C++ implementation of 
Parquet and noticed that I cannot load these files in Drill. Although 
this is probably a bug in the C++ implementation, I don't understand 
what causes the error. Using the Java parquet-tools, I can read these 
files. I'm using Apache Drill 1.8.0 on OSX.

I've posted the error output from Drill and the parquet file as a gist: 
https://gist.github.com/xhochy/d4441a5ff2025b877df43fecd4466a11

If anyone could have a short look into this and tell me why Drill cannot 
read the file, you would really help me to fix the parquet-cpp issues.

Kind Regards,

Uwe


Re: Cannot load Parquet files created with parquet-cpp in Drill

Posted by Uwe Korn <uw...@xhochy.com>.
Happy to report back, that this is really a parquet-cpp issue and not 
something in Drill. Kudos to Deepak Majeti for finding that we did not 
set the dictionary_page_offset in the C++ code.

Uwe

On 07.09.16 21:08, Kunal Khatua wrote:
> Hi Uwe
>
> I believe you're using the latest Apache Drill 1.8.0. From a quick look at the stack trace, it appears to be a potential bug on Drill's interpretation of dictionary encoded data.
>
> One way to verify that your C++ implementation of Parquet is correct would be to have your generated data without dictionary encoding before attempting to see if Drill can read that.
>
> Regards
> Kunal
>
> On Wed 7-Sep-2016 5:30:32 AM, Uwe Korn <uw...@xhochy.com> wrote:
> Hello,
>
> I'm currently looking at the correctness of our C++ implementation of
> Parquet and noticed that I cannot load these files in Drill. Although
> this is probably a bug in the C++ implementation, I don't understand
> what causes the error. Using the Java parquet-tools, I can read these
> files. I'm using Apache Drill 1.8.0 on OSX.
>
> I've posted the error output from Drill and the parquet file as a gist:
> https://gist.github.com/xhochy/d4441a5ff2025b877df43fecd4466a11
>
> If anyone could have a short look into this and tell me why Drill cannot
> read the file, you would really help me to fix the parquet-cpp issues.
>
> Kind Regards,
>
> Uwe
>
>


Re: Cannot load Parquet files created with parquet-cpp in Drill

Posted by Uwe Korn <uw...@xhochy.com>.
Happy to report back, that this is really a parquet-cpp issue and not 
something in Drill. Kudos to Deepak Majeti for finding that we did not 
set the dictionary_page_offset in the C++ code.

Uwe

On 07.09.16 21:08, Kunal Khatua wrote:
> Hi Uwe
>
> I believe you're using the latest Apache Drill 1.8.0. From a quick look at the stack trace, it appears to be a potential bug on Drill's interpretation of dictionary encoded data.
>
> One way to verify that your C++ implementation of Parquet is correct would be to have your generated data without dictionary encoding before attempting to see if Drill can read that.
>
> Regards
> Kunal
>
> On Wed 7-Sep-2016 5:30:32 AM, Uwe Korn <uw...@xhochy.com> wrote:
> Hello,
>
> I'm currently looking at the correctness of our C++ implementation of
> Parquet and noticed that I cannot load these files in Drill. Although
> this is probably a bug in the C++ implementation, I don't understand
> what causes the error. Using the Java parquet-tools, I can read these
> files. I'm using Apache Drill 1.8.0 on OSX.
>
> I've posted the error output from Drill and the parquet file as a gist:
> https://gist.github.com/xhochy/d4441a5ff2025b877df43fecd4466a11
>
> If anyone could have a short look into this and tell me why Drill cannot
> read the file, you would really help me to fix the parquet-cpp issues.
>
> Kind Regards,
>
> Uwe
>
>


Re: Cannot load Parquet files created with parquet-cpp in Drill

Posted by Kunal Khatua <kk...@maprtech.com>.
Hi Uwe

I believe you're using the latest Apache Drill 1.8.0. From a quick look at the stack trace, it appears to be a potential bug on Drill's interpretation of dictionary encoded data. 

One way to verify that your C++ implementation of Parquet is correct would be to have your generated data without dictionary encoding before attempting to see if Drill can read that. 

Regards
Kunal

On Wed 7-Sep-2016 5:30:32 AM, Uwe Korn <uw...@xhochy.com> wrote:
Hello,

I'm currently looking at the correctness of our C++ implementation of
Parquet and noticed that I cannot load these files in Drill. Although
this is probably a bug in the C++ implementation, I don't understand
what causes the error. Using the Java parquet-tools, I can read these
files. I'm using Apache Drill 1.8.0 on OSX.

I've posted the error output from Drill and the parquet file as a gist:
https://gist.github.com/xhochy/d4441a5ff2025b877df43fecd4466a11

If anyone could have a short look into this and tell me why Drill cannot
read the file, you would really help me to fix the parquet-cpp issues.

Kind Regards,

Uwe


Re: Cannot load Parquet files created with parquet-cpp in Drill

Posted by Kunal Khatua <kk...@maprtech.com>.
Hi Uwe

I believe you're using the latest Apache Drill 1.8.0. From a quick look at the stack trace, it appears to be a potential bug on Drill's interpretation of dictionary encoded data. 

One way to verify that your C++ implementation of Parquet is correct would be to have your generated data without dictionary encoding before attempting to see if Drill can read that. 

Regards
Kunal

On Wed 7-Sep-2016 5:30:32 AM, Uwe Korn <uw...@xhochy.com> wrote:
Hello,

I'm currently looking at the correctness of our C++ implementation of
Parquet and noticed that I cannot load these files in Drill. Although
this is probably a bug in the C++ implementation, I don't understand
what causes the error. Using the Java parquet-tools, I can read these
files. I'm using Apache Drill 1.8.0 on OSX.

I've posted the error output from Drill and the parquet file as a gist:
https://gist.github.com/xhochy/d4441a5ff2025b877df43fecd4466a11

If anyone could have a short look into this and tell me why Drill cannot
read the file, you would really help me to fix the parquet-cpp issues.

Kind Regards,

Uwe