You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Ricky Ho <rh...@adobe.com> on 2009/05/20 06:52:53 UTC

Unable to get individual filename

Olga,

I have looked a bit deeper and sounds like it is not possible to extract individual file name from the Custom Loader UDF.  The bindTo just give me the directory name (not individual file).  In other words, there is no way for my UDF loader to figured out the name of the file being loaded.

Am I on the wrong path ?  What is your suggestion ?

I have subscribed twice to the pig-user alias but it doesn't seem to work.  I cannot see the mail that I send out to this alias.

Rgds,
Ricky

-----Original Message-----
From: Ricky Ho 
Sent: Tuesday, May 19, 2009 4:41 PM
To: 'Olga Natkovich'
Subject: RE: PIG Q

OK, now the directory is working ...

I am not able to receive the mail I send to the PIG mail alias (which I already subscribe), so I don't know if I will get any other people's response.

I am trying to create a UDF Store as follows, but it doesn't seem to work.  Am I doing it right ?

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.pig.builtin.PigStorage;
import org.apache.pig.data.Tuple;
import org.apache.pig.impl.io.BufferedPositionedInputStream;

public class MyStore extends PigStorage {
	
	private String currentFile;

	@Override
	public void bindTo(String fileName, BufferedPositionedInputStream in,
			long offset, long end) throws IOException {
		// TODO Auto-generated method stub
		currentFile = fileName;
		super.bindTo(fileName, in, offset, end);
	}

	@Override
	public Tuple getNext() throws IOException {
		// TODO Auto-generated method stub
		List<Object> newList = new ArrayList<Object>();
		newList.add(currentFile);
		newList.add(super.getNext());
		
		Tuple tuple = mTupleFactory.newTupleNoCopy(newList);
		return tuple;
	}

}

Rgds,
Ricky

-----Original Message-----
From: Olga Natkovich [mailto:olgan@yahoo-inc.com] 
Sent: Tuesday, May 19, 2009 12:23 PM
To: Ricky Ho
Subject: RE: PIG Q

Yes, you definitely want to move to at least Pig 0.2.0 - it is a
completely re-written system.

Also, you might want to clarify your question on the list.

Good luck!

Olga 

> -----Original Message-----
> From: Ricky Ho [mailto:rho@adobe.com] 
> Sent: Tuesday, May 19, 2009 10:37 AM
> To: Olga Natkovich
> Subject: RE: PIG Q
> 
> You are right.  I am looking for shared code.
> I am using pig0.1.1 in Cgywin.  Let me try the latest one.
> 
> -----Original Message-----
> From: Olga Natkovich [mailto:olgan@yahoo-inc.com]
> Sent: Tuesday, May 19, 2009 10:16 AM
> To: Ricky Ho
> Subject: RE: PIG Q
> 
> Hi Ricky,
> 
> I just saw your message showing up on the user mailing list. 
> It might take some time for others to respond if they have suggestion.
> 
> Your question might not be very clear - it is certainly 
> doable - you might need to write your own loader to do that. 
> I think what you might want to ask if somebody has already 
> done it and would share their code with you.
> 
> Regarding local mode, I just tested that it works fine in 
> local mode as well. Are you using latest pig code? What are 
> the permissions on your directory? 
> 
> Olga
> 
> > -----Original Message-----
> > From: Ricky Ho [mailto:rho@adobe.com]
> > Sent: Tuesday, May 19, 2009 9:29 AM
> > To: Olga Natkovich
> > Subject: RE: PIG Q
> > 
> > I have send the same question to
> > 'pig-user@hadoop.apache.org'.  But I have get no response.  
> > Have you seen my Q ?  I wonder how many people and in the PIG alias.
> > 
> > I am not using "bin/pig -x local" and not the Hadoop DFS.  
> > Does it make any difference ?
> > 
> > In the first few lines, you can clearly see that PIG can read the 
> > files under the directory "dirdir".
> > 
> > > grunt> A = LOAD 'dirdir/data.txt';
> > > grunt> DUMP A;
> > > (hello I am Ricky)
> > > (How are you)
> > > (I am fine)
> > 
> > But when I just read the directory, it doesn't work.
> > 
> > > grunt> B = LOAD 'dirdir';
> > > grunt> DUMP B;
> > > 2009-05-18 21:38:00,983 [main] ERROR 
> > > org.apache.pig.tools.grunt.GruntParser - ja
> > > va.io.IOException: Unable to open iterator for alias: B
> > >         at
> > > 
> > 
> org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.j
> > > ava:34)
> > > Caused by: java.io.FileNotFoundException: dirdir (Access 
> is denied)
> > 
> > Rgds, Ricky
> > 
> 

Re: Unable to get individual filename

Posted by zhang jianfeng <zj...@gmail.com>.
I found this method can not include the txt files in the subfolders.

It would be better if  Pig provide one load func that can use regex to match
the file name, for example

Load 'data/' USING FilePatternStorage('*.txt')

this loader: FilePatternStorage can load all the txt files in the folder
"data" including its subfolders.







2009/5/21 Marshall Weir <ma...@gmail.com>

> How would this be any different from:
>
> myData = LOAD '/data/*.txt' AS (id, data);
>
> Thanks,
> Marshall
>
>
> On May 21, 2009, at 9:40 AM, zjffdu wrote:
>
>  I also think it is useful to provide another kind of PigStorage to load
>> part
>> of the files in one folder, such as load all the .txt files in the root
>> folder,
>>
>> And I think the best way to do is put the code here:  (POLoad.java  Line
>> 92)
>>
>>   public void setUp() throws IOException{
>>       String filename = lFile.getFileName();
>>       loader =
>> (LoadFunc)PigContext.instantiateFuncFromSpec(lFile.getFuncSpec());
>>
>>       is = FileLocalizer.open(filename, pc);        //  this is the place
>> I can control what kinds of files I can load, the default is loading all
>> the
>> files.
>>
>>       loader.bindTo(filename , new BufferedPositionedInputStream(is), 0,
>> Long.MAX_VALUE);
>>   }
>>
>>
>> In my opinion, I can provide a different "is" regarding the FuncSpec the
>> pig
>> scripts provide, this is code snippet I'd like to change it to be:
>>
>>   public void setUp() throws IOException{
>>       String filename = lFile.getFileName();
>>       loader =
>> (LoadFunc)PigContext.instantiateFuncFromSpec(lFile.getFuncSpec());
>>
>>                if
>> (IFile.getFuncSpec().getClassName.equals(ExtPigStorage.class.getName()){
>>                        String[] ext=IFile.getFuncSpec. getCtorArgs();
>>                        is = FileLocalizer.open(filename,pc,ext);
>>                } else{
>>        is = FileLocalizer.open(filename, pc);
>>       }
>>       loader.bindTo(filename , new BufferedPositionedInputStream(is), 0,
>> Long.MAX_VALUE);
>>   }
>>
>>
>>
>> I can create a sub class of PigStorage called ExtPigStorage,
>>
>> What I need to do is provide my a different kind of "is" which can control
>> what files to load,
>>
>> Olga, What do you think about my proposal ?
>>
>> If you feel it's OK, I can create a JIRA item and give the patch.
>>
>>
>> Thank you.
>>
>>
>> Jeff Zhang
>>
>>
>>
>>
>> -----Original Message-----
>> From: Ricky Ho [mailto:rho@adobe.com]
>> Sent: 2009年5月20日 12:53
>> To: Olga Natkovich
>> Cc: pig-user@hadoop.apache.org
>> Subject: Unable to get individual filename
>>
>> Olga,
>>
>> I have looked a bit deeper and sounds like it is not possible to extract
>> individual file name from the Custom Loader UDF.  The bindTo just give me
>> the directory name (not individual file).  In other words, there is no way
>> for my UDF loader to figured out the name of the file being loaded.
>>
>> Am I on the wrong path ?  What is your suggestion ?
>>
>> I have subscribed twice to the pig-user alias but it doesn't seem to work.
>> I cannot see the mail that I send out to this alias.
>>
>> Rgds,
>> Ricky
>>
>> -----Original Message-----
>> From: Ricky Ho
>> Sent: Tuesday, May 19, 2009 4:41 PM
>> To: 'Olga Natkovich'
>> Subject: RE: PIG Q
>>
>> OK, now the directory is working ...
>>
>> I am not able to receive the mail I send to the PIG mail alias (which I
>> already subscribe), so I don't know if I will get any other people's
>> response.
>>
>> I am trying to create a UDF Store as follows, but it doesn't seem to work.
>> Am I doing it right ?
>>
>> import java.io.IOException;
>> import java.util.ArrayList;
>> import java.util.List;
>>
>> import org.apache.pig.builtin.PigStorage;
>> import org.apache.pig.data.Tuple;
>> import org.apache.pig.impl.io.BufferedPositionedInputStream;
>>
>> public class MyStore extends PigStorage {
>>
>>        private String currentFile;
>>
>>        @Override
>>        public void bindTo(String fileName, BufferedPositionedInputStream
>> in,
>>                        long offset, long end) throws IOException {
>>                // TODO Auto-generated method stub
>>                currentFile = fileName;
>>                super.bindTo(fileName, in, offset, end);
>>        }
>>
>>        @Override
>>        public Tuple getNext() throws IOException {
>>                // TODO Auto-generated method stub
>>                List<Object> newList = new ArrayList<Object>();
>>                newList.add(currentFile);
>>                newList.add(super.getNext());
>>
>>                Tuple tuple = mTupleFactory.newTupleNoCopy(newList);
>>                return tuple;
>>        }
>>
>> }
>>
>> Rgds,
>> Ricky
>>
>> -----Original Message-----
>> From: Olga Natkovich [mailto:olgan@yahoo-inc.com]
>> Sent: Tuesday, May 19, 2009 12:23 PM
>> To: Ricky Ho
>> Subject: RE: PIG Q
>>
>> Yes, you definitely want to move to at least Pig 0.2.0 - it is a
>> completely re-written system.
>>
>> Also, you might want to clarify your question on the list.
>>
>> Good luck!
>>
>> Olga
>>
>>  -----Original Message-----
>>> From: Ricky Ho [mailto:rho@adobe.com]
>>> Sent: Tuesday, May 19, 2009 10:37 AM
>>> To: Olga Natkovich
>>> Subject: RE: PIG Q
>>>
>>> You are right.  I am looking for shared code.
>>> I am using pig0.1.1 in Cgywin.  Let me try the latest one.
>>>
>>> -----Original Message-----
>>> From: Olga Natkovich [mailto:olgan@yahoo-inc.com]
>>> Sent: Tuesday, May 19, 2009 10:16 AM
>>> To: Ricky Ho
>>> Subject: RE: PIG Q
>>>
>>> Hi Ricky,
>>>
>>> I just saw your message showing up on the user mailing list.
>>> It might take some time for others to respond if they have suggestion.
>>>
>>> Your question might not be very clear - it is certainly
>>> doable - you might need to write your own loader to do that.
>>> I think what you might want to ask if somebody has already
>>> done it and would share their code with you.
>>>
>>> Regarding local mode, I just tested that it works fine in
>>> local mode as well. Are you using latest pig code? What are
>>> the permissions on your directory?
>>>
>>> Olga
>>>
>>>  -----Original Message-----
>>>> From: Ricky Ho [mailto:rho@adobe.com]
>>>> Sent: Tuesday, May 19, 2009 9:29 AM
>>>> To: Olga Natkovich
>>>> Subject: RE: PIG Q
>>>>
>>>> I have send the same question to
>>>> 'pig-user@hadoop.apache.org'.  But I have get no response.
>>>> Have you seen my Q ?  I wonder how many people and in the PIG alias.
>>>>
>>>> I am not using "bin/pig -x local" and not the Hadoop DFS.
>>>> Does it make any difference ?
>>>>
>>>> In the first few lines, you can clearly see that PIG can read the
>>>> files under the directory "dirdir".
>>>>
>>>>  grunt> A = LOAD 'dirdir/data.txt';
>>>>> grunt> DUMP A;
>>>>> (hello I am Ricky)
>>>>> (How are you)
>>>>> (I am fine)
>>>>>
>>>>
>>>> But when I just read the directory, it doesn't work.
>>>>
>>>>  grunt> B = LOAD 'dirdir';
>>>>> grunt> DUMP B;
>>>>> 2009-05-18 21:38:00,983 [main] ERROR
>>>>> org.apache.pig.tools.grunt.GruntParser - ja
>>>>> va.io.IOException: Unable to open iterator for alias: B
>>>>>       at
>>>>>
>>>>>
>>>>  org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.j
>>>
>>>> ava:34)
>>>>> Caused by: java.io.FileNotFoundException: dirdir (Access
>>>>>
>>>> is denied)
>>>
>>>>
>>>> Rgds, Ricky
>>>>
>>>>
>>>
>>
>

RE: Unable to get individual filename

Posted by zjffdu <zj...@gmail.com>.
Great, The pig support this feature, I thought it did not support this. I am
regretful never trying this.



-----Original Message-----
From: Marshall Weir [mailto:marshall.weir@gmail.com] 
Sent: 2009年5月21日 21:51
To: pig-user@hadoop.apache.org
Subject: Re: Unable to get individual filename

How would this be any different from:

myData = LOAD '/data/*.txt' AS (id, data);

Thanks,
Marshall

On May 21, 2009, at 9:40 AM, zjffdu wrote:

> I also think it is useful to provide another kind of PigStorage to  
> load part
> of the files in one folder, such as load all the .txt files in the  
> root
> folder,
>
> And I think the best way to do is put the code here:  (POLoad.java   
> Line 92)
>
>    public void setUp() throws IOException{
>        String filename = lFile.getFileName();
>        loader =
> (LoadFunc)PigContext.instantiateFuncFromSpec(lFile.getFuncSpec());
>
>        is = FileLocalizer.open(filename, pc);        //  this is the  
> place
> I can control what kinds of files I can load, the default is loading  
> all the
> files.
>
>        loader.bindTo(filename , new  
> BufferedPositionedInputStream(is), 0,
> Long.MAX_VALUE);
>    }
>
>
> In my opinion, I can provide a different "is" regarding the FuncSpec  
> the pig
> scripts provide, this is code snippet I'd like to change it to be:
>
>    public void setUp() throws IOException{
>        String filename = lFile.getFileName();
>        loader =
> (LoadFunc)PigContext.instantiateFuncFromSpec(lFile.getFuncSpec());
>
> 		if
> (IFile 
> .getFuncSpec().getClassName.equals(ExtPigStorage.class.getName()){
> 			String[] ext=IFile.getFuncSpec. getCtorArgs();
> 			is = FileLocalizer.open(filename,pc,ext);
> 		} else{
>        	is = FileLocalizer.open(filename, pc);
>        }
>        loader.bindTo(filename , new  
> BufferedPositionedInputStream(is), 0,
> Long.MAX_VALUE);
>    }
>
>
>
> I can create a sub class of PigStorage called ExtPigStorage,
>
> What I need to do is provide my a different kind of "is" which can  
> control
> what files to load,
>
> Olga, What do you think about my proposal ?
>
> If you feel it's OK, I can create a JIRA item and give the patch.
>
>
> Thank you.
>
>
> Jeff Zhang
>
>
>
>
> -----Original Message-----
> From: Ricky Ho [mailto:rho@adobe.com]
> Sent: 2009年5月20日 12:53
> To: Olga Natkovich
> Cc: pig-user@hadoop.apache.org
> Subject: Unable to get individual filename
>
> Olga,
>
> I have looked a bit deeper and sounds like it is not possible to  
> extract
> individual file name from the Custom Loader UDF.  The bindTo just  
> give me
> the directory name (not individual file).  In other words, there is  
> no way
> for my UDF loader to figured out the name of the file being loaded.
>
> Am I on the wrong path ?  What is your suggestion ?
>
> I have subscribed twice to the pig-user alias but it doesn't seem to  
> work.
> I cannot see the mail that I send out to this alias.
>
> Rgds,
> Ricky
>
> -----Original Message-----
> From: Ricky Ho
> Sent: Tuesday, May 19, 2009 4:41 PM
> To: 'Olga Natkovich'
> Subject: RE: PIG Q
>
> OK, now the directory is working ...
>
> I am not able to receive the mail I send to the PIG mail alias  
> (which I
> already subscribe), so I don't know if I will get any other people's
> response.
>
> I am trying to create a UDF Store as follows, but it doesn't seem to  
> work.
> Am I doing it right ?
>
> import java.io.IOException;
> import java.util.ArrayList;
> import java.util.List;
>
> import org.apache.pig.builtin.PigStorage;
> import org.apache.pig.data.Tuple;
> import org.apache.pig.impl.io.BufferedPositionedInputStream;
>
> public class MyStore extends PigStorage {
> 	
> 	private String currentFile;
>
> 	@Override
> 	public void bindTo(String fileName, BufferedPositionedInputStream
> in,
> 			long offset, long end) throws IOException {
> 		// TODO Auto-generated method stub
> 		currentFile = fileName;
> 		super.bindTo(fileName, in, offset, end);
> 	}
>
> 	@Override
> 	public Tuple getNext() throws IOException {
> 		// TODO Auto-generated method stub
> 		List<Object> newList = new ArrayList<Object>();
> 		newList.add(currentFile);
> 		newList.add(super.getNext());
> 		
> 		Tuple tuple = mTupleFactory.newTupleNoCopy(newList);
> 		return tuple;
> 	}
>
> }
>
> Rgds,
> Ricky
>
> -----Original Message-----
> From: Olga Natkovich [mailto:olgan@yahoo-inc.com]
> Sent: Tuesday, May 19, 2009 12:23 PM
> To: Ricky Ho
> Subject: RE: PIG Q
>
> Yes, you definitely want to move to at least Pig 0.2.0 - it is a
> completely re-written system.
>
> Also, you might want to clarify your question on the list.
>
> Good luck!
>
> Olga
>
>> -----Original Message-----
>> From: Ricky Ho [mailto:rho@adobe.com]
>> Sent: Tuesday, May 19, 2009 10:37 AM
>> To: Olga Natkovich
>> Subject: RE: PIG Q
>>
>> You are right.  I am looking for shared code.
>> I am using pig0.1.1 in Cgywin.  Let me try the latest one.
>>
>> -----Original Message-----
>> From: Olga Natkovich [mailto:olgan@yahoo-inc.com]
>> Sent: Tuesday, May 19, 2009 10:16 AM
>> To: Ricky Ho
>> Subject: RE: PIG Q
>>
>> Hi Ricky,
>>
>> I just saw your message showing up on the user mailing list.
>> It might take some time for others to respond if they have  
>> suggestion.
>>
>> Your question might not be very clear - it is certainly
>> doable - you might need to write your own loader to do that.
>> I think what you might want to ask if somebody has already
>> done it and would share their code with you.
>>
>> Regarding local mode, I just tested that it works fine in
>> local mode as well. Are you using latest pig code? What are
>> the permissions on your directory?
>>
>> Olga
>>
>>> -----Original Message-----
>>> From: Ricky Ho [mailto:rho@adobe.com]
>>> Sent: Tuesday, May 19, 2009 9:29 AM
>>> To: Olga Natkovich
>>> Subject: RE: PIG Q
>>>
>>> I have send the same question to
>>> 'pig-user@hadoop.apache.org'.  But I have get no response.
>>> Have you seen my Q ?  I wonder how many people and in the PIG alias.
>>>
>>> I am not using "bin/pig -x local" and not the Hadoop DFS.
>>> Does it make any difference ?
>>>
>>> In the first few lines, you can clearly see that PIG can read the
>>> files under the directory "dirdir".
>>>
>>>> grunt> A = LOAD 'dirdir/data.txt';
>>>> grunt> DUMP A;
>>>> (hello I am Ricky)
>>>> (How are you)
>>>> (I am fine)
>>>
>>> But when I just read the directory, it doesn't work.
>>>
>>>> grunt> B = LOAD 'dirdir';
>>>> grunt> DUMP B;
>>>> 2009-05-18 21:38:00,983 [main] ERROR
>>>> org.apache.pig.tools.grunt.GruntParser - ja
>>>> va.io.IOException: Unable to open iterator for alias: B
>>>>        at
>>>>
>>>
>> org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.j
>>>> ava:34)
>>>> Caused by: java.io.FileNotFoundException: dirdir (Access
>> is denied)
>>>
>>> Rgds, Ricky
>>>
>>
>


Re: Unable to get individual filename

Posted by Marshall Weir <ma...@gmail.com>.
How would this be any different from:

myData = LOAD '/data/*.txt' AS (id, data);

Thanks,
Marshall

On May 21, 2009, at 9:40 AM, zjffdu wrote:

> I also think it is useful to provide another kind of PigStorage to  
> load part
> of the files in one folder, such as load all the .txt files in the  
> root
> folder,
>
> And I think the best way to do is put the code here:  (POLoad.java   
> Line 92)
>
>    public void setUp() throws IOException{
>        String filename = lFile.getFileName();
>        loader =
> (LoadFunc)PigContext.instantiateFuncFromSpec(lFile.getFuncSpec());
>
>        is = FileLocalizer.open(filename, pc);        //  this is the  
> place
> I can control what kinds of files I can load, the default is loading  
> all the
> files.
>
>        loader.bindTo(filename , new  
> BufferedPositionedInputStream(is), 0,
> Long.MAX_VALUE);
>    }
>
>
> In my opinion, I can provide a different "is" regarding the FuncSpec  
> the pig
> scripts provide, this is code snippet I'd like to change it to be:
>
>    public void setUp() throws IOException{
>        String filename = lFile.getFileName();
>        loader =
> (LoadFunc)PigContext.instantiateFuncFromSpec(lFile.getFuncSpec());
>
> 		if
> (IFile 
> .getFuncSpec().getClassName.equals(ExtPigStorage.class.getName()){
> 			String[] ext=IFile.getFuncSpec. getCtorArgs();
> 			is = FileLocalizer.open(filename,pc,ext);
> 		} else{
>        	is = FileLocalizer.open(filename, pc);
>        }
>        loader.bindTo(filename , new  
> BufferedPositionedInputStream(is), 0,
> Long.MAX_VALUE);
>    }
>
>
>
> I can create a sub class of PigStorage called ExtPigStorage,
>
> What I need to do is provide my a different kind of "is" which can  
> control
> what files to load,
>
> Olga, What do you think about my proposal ?
>
> If you feel it's OK, I can create a JIRA item and give the patch.
>
>
> Thank you.
>
>
> Jeff Zhang
>
>
>
>
> -----Original Message-----
> From: Ricky Ho [mailto:rho@adobe.com]
> Sent: 2009年5月20日 12:53
> To: Olga Natkovich
> Cc: pig-user@hadoop.apache.org
> Subject: Unable to get individual filename
>
> Olga,
>
> I have looked a bit deeper and sounds like it is not possible to  
> extract
> individual file name from the Custom Loader UDF.  The bindTo just  
> give me
> the directory name (not individual file).  In other words, there is  
> no way
> for my UDF loader to figured out the name of the file being loaded.
>
> Am I on the wrong path ?  What is your suggestion ?
>
> I have subscribed twice to the pig-user alias but it doesn't seem to  
> work.
> I cannot see the mail that I send out to this alias.
>
> Rgds,
> Ricky
>
> -----Original Message-----
> From: Ricky Ho
> Sent: Tuesday, May 19, 2009 4:41 PM
> To: 'Olga Natkovich'
> Subject: RE: PIG Q
>
> OK, now the directory is working ...
>
> I am not able to receive the mail I send to the PIG mail alias  
> (which I
> already subscribe), so I don't know if I will get any other people's
> response.
>
> I am trying to create a UDF Store as follows, but it doesn't seem to  
> work.
> Am I doing it right ?
>
> import java.io.IOException;
> import java.util.ArrayList;
> import java.util.List;
>
> import org.apache.pig.builtin.PigStorage;
> import org.apache.pig.data.Tuple;
> import org.apache.pig.impl.io.BufferedPositionedInputStream;
>
> public class MyStore extends PigStorage {
> 	
> 	private String currentFile;
>
> 	@Override
> 	public void bindTo(String fileName, BufferedPositionedInputStream
> in,
> 			long offset, long end) throws IOException {
> 		// TODO Auto-generated method stub
> 		currentFile = fileName;
> 		super.bindTo(fileName, in, offset, end);
> 	}
>
> 	@Override
> 	public Tuple getNext() throws IOException {
> 		// TODO Auto-generated method stub
> 		List<Object> newList = new ArrayList<Object>();
> 		newList.add(currentFile);
> 		newList.add(super.getNext());
> 		
> 		Tuple tuple = mTupleFactory.newTupleNoCopy(newList);
> 		return tuple;
> 	}
>
> }
>
> Rgds,
> Ricky
>
> -----Original Message-----
> From: Olga Natkovich [mailto:olgan@yahoo-inc.com]
> Sent: Tuesday, May 19, 2009 12:23 PM
> To: Ricky Ho
> Subject: RE: PIG Q
>
> Yes, you definitely want to move to at least Pig 0.2.0 - it is a
> completely re-written system.
>
> Also, you might want to clarify your question on the list.
>
> Good luck!
>
> Olga
>
>> -----Original Message-----
>> From: Ricky Ho [mailto:rho@adobe.com]
>> Sent: Tuesday, May 19, 2009 10:37 AM
>> To: Olga Natkovich
>> Subject: RE: PIG Q
>>
>> You are right.  I am looking for shared code.
>> I am using pig0.1.1 in Cgywin.  Let me try the latest one.
>>
>> -----Original Message-----
>> From: Olga Natkovich [mailto:olgan@yahoo-inc.com]
>> Sent: Tuesday, May 19, 2009 10:16 AM
>> To: Ricky Ho
>> Subject: RE: PIG Q
>>
>> Hi Ricky,
>>
>> I just saw your message showing up on the user mailing list.
>> It might take some time for others to respond if they have  
>> suggestion.
>>
>> Your question might not be very clear - it is certainly
>> doable - you might need to write your own loader to do that.
>> I think what you might want to ask if somebody has already
>> done it and would share their code with you.
>>
>> Regarding local mode, I just tested that it works fine in
>> local mode as well. Are you using latest pig code? What are
>> the permissions on your directory?
>>
>> Olga
>>
>>> -----Original Message-----
>>> From: Ricky Ho [mailto:rho@adobe.com]
>>> Sent: Tuesday, May 19, 2009 9:29 AM
>>> To: Olga Natkovich
>>> Subject: RE: PIG Q
>>>
>>> I have send the same question to
>>> 'pig-user@hadoop.apache.org'.  But I have get no response.
>>> Have you seen my Q ?  I wonder how many people and in the PIG alias.
>>>
>>> I am not using "bin/pig -x local" and not the Hadoop DFS.
>>> Does it make any difference ?
>>>
>>> In the first few lines, you can clearly see that PIG can read the
>>> files under the directory "dirdir".
>>>
>>>> grunt> A = LOAD 'dirdir/data.txt';
>>>> grunt> DUMP A;
>>>> (hello I am Ricky)
>>>> (How are you)
>>>> (I am fine)
>>>
>>> But when I just read the directory, it doesn't work.
>>>
>>>> grunt> B = LOAD 'dirdir';
>>>> grunt> DUMP B;
>>>> 2009-05-18 21:38:00,983 [main] ERROR
>>>> org.apache.pig.tools.grunt.GruntParser - ja
>>>> va.io.IOException: Unable to open iterator for alias: B
>>>>        at
>>>>
>>>
>> org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.j
>>>> ava:34)
>>>> Caused by: java.io.FileNotFoundException: dirdir (Access
>> is denied)
>>>
>>> Rgds, Ricky
>>>
>>
>


RE: Unable to get individual filename

Posted by zjffdu <zj...@gmail.com>.
I also think it is useful to provide another kind of PigStorage to load part
of the files in one folder, such as load all the .txt files in the root
folder,

And I think the best way to do is put the code here:  (POLoad.java  Line 92)

    public void setUp() throws IOException{
        String filename = lFile.getFileName();
        loader =
(LoadFunc)PigContext.instantiateFuncFromSpec(lFile.getFuncSpec());
        
        is = FileLocalizer.open(filename, pc);        //  this is the place
I can control what kinds of files I can load, the default is loading all the
files.
        
        loader.bindTo(filename , new BufferedPositionedInputStream(is), 0,
Long.MAX_VALUE);
    }


In my opinion, I can provide a different "is" regarding the FuncSpec the pig
scripts provide, this is code snippet I'd like to change it to be: 

    public void setUp() throws IOException{
        String filename = lFile.getFileName();
        loader =
(LoadFunc)PigContext.instantiateFuncFromSpec(lFile.getFuncSpec());
        
		if
(IFile.getFuncSpec().getClassName.equals(ExtPigStorage.class.getName()){
			String[] ext=IFile.getFuncSpec. getCtorArgs();
			is = FileLocalizer.open(filename,pc,ext);
		} else{
        	is = FileLocalizer.open(filename, pc);        
        }
        loader.bindTo(filename , new BufferedPositionedInputStream(is), 0,
Long.MAX_VALUE);
    }



I can create a sub class of PigStorage called ExtPigStorage, 

What I need to do is provide my a different kind of "is" which can control
what files to load, 

Olga, What do you think about my proposal ?

If you feel it's OK, I can create a JIRA item and give the patch.


Thank you.


Jeff Zhang




-----Original Message-----
From: Ricky Ho [mailto:rho@adobe.com] 
Sent: 2009年5月20日 12:53
To: Olga Natkovich
Cc: pig-user@hadoop.apache.org
Subject: Unable to get individual filename

Olga,

I have looked a bit deeper and sounds like it is not possible to extract
individual file name from the Custom Loader UDF.  The bindTo just give me
the directory name (not individual file).  In other words, there is no way
for my UDF loader to figured out the name of the file being loaded.

Am I on the wrong path ?  What is your suggestion ?

I have subscribed twice to the pig-user alias but it doesn't seem to work.
I cannot see the mail that I send out to this alias.

Rgds,
Ricky

-----Original Message-----
From: Ricky Ho 
Sent: Tuesday, May 19, 2009 4:41 PM
To: 'Olga Natkovich'
Subject: RE: PIG Q

OK, now the directory is working ...

I am not able to receive the mail I send to the PIG mail alias (which I
already subscribe), so I don't know if I will get any other people's
response.

I am trying to create a UDF Store as follows, but it doesn't seem to work.
Am I doing it right ?

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.pig.builtin.PigStorage;
import org.apache.pig.data.Tuple;
import org.apache.pig.impl.io.BufferedPositionedInputStream;

public class MyStore extends PigStorage {
	
	private String currentFile;

	@Override
	public void bindTo(String fileName, BufferedPositionedInputStream
in,
			long offset, long end) throws IOException {
		// TODO Auto-generated method stub
		currentFile = fileName;
		super.bindTo(fileName, in, offset, end);
	}

	@Override
	public Tuple getNext() throws IOException {
		// TODO Auto-generated method stub
		List<Object> newList = new ArrayList<Object>();
		newList.add(currentFile);
		newList.add(super.getNext());
		
		Tuple tuple = mTupleFactory.newTupleNoCopy(newList);
		return tuple;
	}

}

Rgds,
Ricky

-----Original Message-----
From: Olga Natkovich [mailto:olgan@yahoo-inc.com] 
Sent: Tuesday, May 19, 2009 12:23 PM
To: Ricky Ho
Subject: RE: PIG Q

Yes, you definitely want to move to at least Pig 0.2.0 - it is a
completely re-written system.

Also, you might want to clarify your question on the list.

Good luck!

Olga 

> -----Original Message-----
> From: Ricky Ho [mailto:rho@adobe.com] 
> Sent: Tuesday, May 19, 2009 10:37 AM
> To: Olga Natkovich
> Subject: RE: PIG Q
> 
> You are right.  I am looking for shared code.
> I am using pig0.1.1 in Cgywin.  Let me try the latest one.
> 
> -----Original Message-----
> From: Olga Natkovich [mailto:olgan@yahoo-inc.com]
> Sent: Tuesday, May 19, 2009 10:16 AM
> To: Ricky Ho
> Subject: RE: PIG Q
> 
> Hi Ricky,
> 
> I just saw your message showing up on the user mailing list. 
> It might take some time for others to respond if they have suggestion.
> 
> Your question might not be very clear - it is certainly 
> doable - you might need to write your own loader to do that. 
> I think what you might want to ask if somebody has already 
> done it and would share their code with you.
> 
> Regarding local mode, I just tested that it works fine in 
> local mode as well. Are you using latest pig code? What are 
> the permissions on your directory? 
> 
> Olga
> 
> > -----Original Message-----
> > From: Ricky Ho [mailto:rho@adobe.com]
> > Sent: Tuesday, May 19, 2009 9:29 AM
> > To: Olga Natkovich
> > Subject: RE: PIG Q
> > 
> > I have send the same question to
> > 'pig-user@hadoop.apache.org'.  But I have get no response.  
> > Have you seen my Q ?  I wonder how many people and in the PIG alias.
> > 
> > I am not using "bin/pig -x local" and not the Hadoop DFS.  
> > Does it make any difference ?
> > 
> > In the first few lines, you can clearly see that PIG can read the 
> > files under the directory "dirdir".
> > 
> > > grunt> A = LOAD 'dirdir/data.txt';
> > > grunt> DUMP A;
> > > (hello I am Ricky)
> > > (How are you)
> > > (I am fine)
> > 
> > But when I just read the directory, it doesn't work.
> > 
> > > grunt> B = LOAD 'dirdir';
> > > grunt> DUMP B;
> > > 2009-05-18 21:38:00,983 [main] ERROR 
> > > org.apache.pig.tools.grunt.GruntParser - ja
> > > va.io.IOException: Unable to open iterator for alias: B
> > >         at
> > > 
> > 
> org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.j
> > > ava:34)
> > > Caused by: java.io.FileNotFoundException: dirdir (Access 
> is denied)
> > 
> > Rgds, Ricky
> > 
>