You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Ryan Blue <rb...@netflix.com.INVALID> on 2017/02/15 23:29:17 UTC
Re: Use pig to load original parquet file
I don't think any of the object models or processing engines other than
parquet-protobuf support lists that aren't the 3-level format from the
spec. The behavior is specified, but I don't think anyone has implemented
support yet. If you want to contribute support, we can help you out and
review. Otherwise, I'd recommend rewriting the data to use the more
standard 3-level list representation.
rb
On Tue, Feb 14, 2017 at 10:25 PM, abel_ke@trend.com.tw <abel_ke@trend.com.tw
> wrote:
> Hi
> When I use org.apache.parquet.pig.ParquetLoader in parquet-mr to read
> original parquet file, it always show error about schema error. And I know
> the problem is original parquet file don't have pig schema(tuple and bag)
> like:
>
>
>
> pig schema:
>
> optional binary firmware_version (UTF8);
> optional group dpi (LIST) {
> repeated group dpi_tuple {
> optional binary name (UTF8);
> optional int32 on;
> optional int64 time;
> }
> }
>
> parquet schema:
>
> optional binary firmware_version (UTF8);
> repeated group dpi {
> required binary name (UTF8);
> optional boolean on;
> optional int64 time;
> }
>
> There have any way can load original parquet file on pig, and don’t change
> file schema
>
> <table class="TM_EMAIL_NOTICE"><tr><td><pre>
> TREND MICRO EMAIL NOTICE
> The information contained in this email and any attachments is confidential
> and may be subject to copyright or other intellectual property protection.
> If you are not the intended recipient, you are not authorized to use or
> disclose this information, and we request that you notify us by reply mail
> or
> telephone and delete the original message from your mail system.
> </pre></td></tr></table>
--
Ryan Blue
Software Engineer
Netflix