You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Br...@McAfee.com on 2011/07/26 20:12:29 UTC

Checking for '\n' and EOF in input

Hello,

I have a custom loader function to read in a parsed schema from some log files, but it seems there is a problem with some of the log files and I need to detect if the end of a line in the log does not end with '\n' or is EOF when loading from the file.  I'm currently running Pig 0.8.0 with Hadoop 0.20.2, and I'm using the RecordReader class in my loader function to read in from a text file in the following way:

RecordReader in = null;

public Tuple getNext() throws IOException {
               try {
                              boolean notDone = in.nextKeyValue();
                              if (!notDone) {
                                             return null;
                              }
                              Text tval = (Text)in.getCurrentValue();
                              String val = tval.toString();

However, there's no way using this method to check for '\n' or EOF in the String val, so I'm not sure if it's possible to use another type of Record Reader or some other method to check for these values.  Any suggestions on how to do this in a custom Pig loader function?

Re: Checking for '\n' and EOF in input

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
It sounds like you need to write your own recordReader (and associated
inputFormat)

D

On Tue, Jul 26, 2011 at 11:12 AM, <Br...@mcafee.com> wrote:

> Hello,
>
> I have a custom loader function to read in a parsed schema from some log
> files, but it seems there is a problem with some of the log files and I need
> to detect if the end of a line in the log does not end with '\n' or is EOF
> when loading from the file.  I'm currently running Pig 0.8.0 with Hadoop
> 0.20.2, and I'm using the RecordReader class in my loader function to read
> in from a text file in the following way:
>
> RecordReader in = null;
>
> public Tuple getNext() throws IOException {
>               try {
>                              boolean notDone = in.nextKeyValue();
>                              if (!notDone) {
>                                             return null;
>                              }
>                              Text tval = (Text)in.getCurrentValue();
>                              String val = tval.toString();
>
> However, there's no way using this method to check for '\n' or EOF in the
> String val, so I'm not sure if it's possible to use another type of Record
> Reader or some other method to check for these values.  Any suggestions on
> how to do this in a custom Pig loader function?
>