You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Ryan Blue <rb...@netflix.com.INVALID> on 2017/02/15 23:29:17 UTC

Re: Use pig to load original parquet file

I don't think any of the object models or processing engines other than
parquet-protobuf support lists that aren't the 3-level format from the
spec. The behavior is specified, but I don't think anyone has implemented
support yet. If you want to contribute support, we can help you out and
review. Otherwise, I'd recommend rewriting the data to use the more
standard 3-level list representation.

rb

On Tue, Feb 14, 2017 at 10:25 PM, abel_ke@trend.com.tw <abel_ke@trend.com.tw
> wrote:

> Hi
> When I use org.apache.parquet.pig.ParquetLoader in parquet-mr to read
> original parquet file, it always show error about schema error. And  I know
> the problem is original parquet file don't have pig schema(tuple and bag)
> like:
>
>
>
> pig schema:
>
> optional binary firmware_version (UTF8);
> optional group dpi (LIST) {
>     repeated group dpi_tuple {
>         optional binary name (UTF8);
>         optional int32 on;
>         optional int64 time;
>     }
> }
>
> parquet schema:
>
> optional binary firmware_version (UTF8);
> repeated group dpi {
>     required binary name (UTF8);
>     optional boolean on;
>     optional int64 time;
> }
>
> There have any way can load original parquet file on pig, and don’t change
> file schema
>
> <table class="TM_EMAIL_NOTICE"><tr><td><pre>
> TREND MICRO EMAIL NOTICE
> The information contained in this email and any attachments is confidential
> and may be subject to copyright or other intellectual property protection.
> If you are not the intended recipient, you are not authorized to use or
> disclose this information, and we request that you notify us by reply mail
> or
> telephone and delete the original message from your mail system.
> </pre></td></tr></table>




-- 
Ryan Blue
Software Engineer
Netflix