You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by John Smith <le...@gmail.com> on 2015/12/16 18:15:10 UTC

problem with the schema geneation - outputSchema

Hi all,


my intention is to write generic pig script using UDF that can process csv
files with different number of fields per file. Each time pig processes one
type of the input file. The UDF will produce a bag with two tuples, the
number of records inside the tuple will depend based on the internal logic
inside UDF.

My problem is that I cant pass any temporary variable from the exec()
method into outputSchema(Schema input) method which is part of the UDF
class. The temporary variable contains information needed to generate valid
output schema inside outputSchema(), eg. size of the tuples, names
definition, data types, etc.

Is there any solution or any more efficient way how to solve it?


Thank you