You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Aniruddh Sharma <as...@gmail.com> on 2015/03/12 14:14:14 UTC

Query on Split and RecordReader Functionality

Hi

I am trying to write a custom UDF in PIG to load a Video file.
I am trying to extend class PigTextInputFormat and use my class and control
its split and supply a custom record reader

As Video file is unstructured, i do not know where Video file would get
split and if individual frames in Video file will cross the boundary in
different Splits.

Following are my queries

a) If I want to split on my custom requirement. (I had overridden
computeSplitSize and printed in it) . It is getting called because my
command is getting printed , but it is not splitting as per my return value
and it is splitting on block size only. Please guide me which function I
need to override to control to split if I want.

b) If I let data split at block size and last record of my unstructrued
data cross boundary of splits and I supply my own RecordReader , then do I
have to write special code in my custom RecordReader to fetch the remaining
record (which crossed boundary in other split) from other split or will
framework automatically handle it.



Thanks and Regards
Aniruddh