You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Олексій Саянкін <ol...@gmail.com> on 2015/07/06 18:34:32 UTC

pig and parquet-bundle*jar

Hi team!

I have found strange issue using pig and parquet files. There is no
parquet-bundle*jar in pig/lib folder so I have to manually add it to avoid
this exception:

pig script failed to validate:
org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not
resolve parquet.pig.ParquetLoader using imports: [, java.lang.,
org.apache.pig.builtin., org.apache.pig.impl.builtin.]

I have investigated build.xml files from pig-0.12 to pig-0.15 and found
that parquet-bundle*jar is only compile time dependency. ANT does not
copy parquet-bundle*jar
to lib folder. Similar issue you can see here
https://issues.apache.org/jira/browse/PIG-3445 (see last comment in the
thread).

So my question is: Was absence of parquet-bundle*jar file done on purpose
or we have a bug here?

Thanks.
Oleksiy Sayankin.

Re: pig and parquet-bundle*jar

Posted by Daniel Dai <da...@hortonworks.com>.
The idea is only include dependencies of most popular
UDF/LoadFunc/StoreFunc in lib. If Pig include dependencies of all existing
Pig UDF/LoadFunc/StoreFunc, Pig might end up bundling too many jars.
Clearly popularity is not a measurable term and it will change over time,
and we make such a decision subjectively. If we get enough interest in a
particular UDF/LoadFunc/StoreFunc, we can include dependencies in.

This is independent with fatjar change. Fatjar change makes dependent jars
more transparent to users so user will not hit mysterious jar conflict due
to the uber pig.jar.

Thanks,
Daniel

On 7/7/15, 4:40 AM, "Lorand Bendig" <lb...@gmail.com> wrote:

>
>Hi Oleksiy,
>
>Initially the idea was that not to include an additional dependency to
>the pig fatjar. Instead, let the
>user ship the necessary parquet bundle.
>However, with PIG-3737 the dependent jars are now copied to the
>$PIG_HOME/lib directory.
>I suspect, you are right, the patch in PIG-3737 need to be extended in
>order to have parquet-pig-bundle-*.jar
>in the /lib directory as well.
>On the other hand, it would be also great to bump parquet-bundle version
>from 1.2.3 to 1.7.0.
>
>@Daniel, what do you think?
>
>Thanks,
>Lorand
>
>On 06/07/15 18:34, Олексій Саянкін wrote:
>> Hi team!
>>
>> I have found strange issue using pig and parquet files. There is no
>> parquet-bundle*jar in pig/lib folder so I have to manually add it to
>>avoid
>> this exception:
>>
>> pig script failed to validate:
>> org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could
>>not
>> resolve parquet.pig.ParquetLoader using imports: [, java.lang.,
>> org.apache.pig.builtin., org.apache.pig.impl.builtin.]
>>
>> I have investigated build.xml files from pig-0.12 to pig-0.15 and found
>> that parquet-bundle*jar is only compile time dependency. ANT does not
>> copy parquet-bundle*jar
>> to lib folder. Similar issue you can see here
>> https://issues.apache.org/jira/browse/PIG-3445 (see last comment in the
>> thread).
>>
>> So my question is: Was absence of parquet-bundle*jar file done on
>>purpose
>> or we have a bug here?
>>
>> Thanks.
>> Oleksiy Sayankin.
>>
>
>


Re: pig and parquet-bundle*jar

Posted by Lorand Bendig <lb...@gmail.com>.
Hi Oleksiy,

Initially the idea was that not to include an additional dependency to 
the pig fatjar. Instead, let the
user ship the necessary parquet bundle.
However, with PIG-3737 the dependent jars are now copied to the 
$PIG_HOME/lib directory.
I suspect, you are right, the patch in PIG-3737 need to be extended in 
order to have parquet-pig-bundle-*.jar
in the /lib directory as well.
On the other hand, it would be also great to bump parquet-bundle version 
from 1.2.3 to 1.7.0.

@Daniel, what do you think?

Thanks,
Lorand

On 06/07/15 18:34, Олексій Саянкін wrote:
> Hi team!
>
> I have found strange issue using pig and parquet files. There is no
> parquet-bundle*jar in pig/lib folder so I have to manually add it to avoid
> this exception:
>
> pig script failed to validate:
> org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not
> resolve parquet.pig.ParquetLoader using imports: [, java.lang.,
> org.apache.pig.builtin., org.apache.pig.impl.builtin.]
>
> I have investigated build.xml files from pig-0.12 to pig-0.15 and found
> that parquet-bundle*jar is only compile time dependency. ANT does not
> copy parquet-bundle*jar
> to lib folder. Similar issue you can see here
> https://issues.apache.org/jira/browse/PIG-3445 (see last comment in the
> thread).
>
> So my question is: Was absence of parquet-bundle*jar file done on purpose
> or we have a bug here?
>
> Thanks.
> Oleksiy Sayankin.
>