You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Damien Hardy <dh...@figarocms.fr> on 2011/09/20 18:02:43 UTC

HBaseStorage STORE : IndexOutOfBoundsException

Hello,

This is my pig script :
DEFINE iplookup `wrapper.sh GeoIP` ship ('wrapper.sh') cache('/GeoIP/GeoIPcity.dat#GeoIP') input (stdin using PigStreaming(',')) output (stdout using PigStreaming(','));

A = load 'log' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('default:body','-gt=_f:squid_t:20110920 -loadKey') AS (rowkey, data);
B = FILTER A BY rowkey matches '.*_s:204-.*';
C = FOREACH B {
	t = REGEX_EXTRACT(data,'([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}):([0-9]+) ',1);
	generate rowkey, t;
}
D = STREAM C THROUGH iplookup;
STORE D INTO 'geoip_pig' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('location:ip,location:country_code,location:country_code3,location:country_name,location:region,location:city,location:postal_code,location:latitude,location:longitude,location:area_code,location:metro_code');


There is 11 columns in my final table/columnFamily (STORE).

I get some jobs (2/46) ending with :

java.lang.IndexOutOfBoundsException: Index: 11, Size: 11
	at java.util.ArrayList.RangeCheck(ArrayList.java:547)
	at java.util.ArrayList.get(ArrayList.java:322)
	at org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:666)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:531)
	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
	at org.apache.hadoop.mapred.Child.main(Child.java:264)

Most the jobs them ended successfully.

In src/org/apache/pig/backend/hadoop/hbase/HBaseStorage.java around line 666 ( Damn ! )

         for (int i=1;i<  t.size();++i){
             ColumnInfo columnInfo = columnInfo_.get(i-1);
             if (LOG.isDebugEnabled()) {
                 LOG.debug("putNext - tuple: " + i + ", value=" + t.get(i) +
                         ", cf:column=" + columnInfo);
             }


Is it possible that columnInfo_ and t are not the same size ? In which case ?

Regards,

-- 
Damien




Re: HBaseStorage STORE : IndexOutOfBoundsException

Posted by Damien Hardy <dh...@figarocms.fr>.
\o/ Thanks !

hbase(main):001:0> scan 'geoip_pig', { FILTER => 
org.apache.hadoop.hbase.filter.SingleColumnValueFilter.new(org.apache.hadoop.hbase.util.Bytes.toBytes('location'),org.apache.hadoop.hbase.util.Bytes.toBytes('ip'),org.apache.hadoop.hbase.filter.CompareFilter::CompareOp.valueOf('EQUAL'),org.apache.hadoop.hbase.filter.SubstringComparator.new('210.123.94.89')) 
}
ROW                                                  COLUMN+CELL
  _f:squid_t:20110920162145_b:squid_s:204-2HONaSaoMEd 
column=location:area_code, timestamp=1316597913077, value=
  mcxVnTAeAMQ==
  _f:squid_t:20110920162145_b:squid_s:204-2HONaSaoMEd 
column=location:city, timestamp=1316597913077, value=Seoul
  mcxVnTAeAMQ==
  _f:squid_t:20110920162145_b:squid_s:204-2HONaSaoMEd 
column=location:country_code, timestamp=1316597913077, value=KR
  mcxVnTAeAMQ==
  _f:squid_t:20110920162145_b:squid_s:204-2HONaSaoMEd 
column=location:country_code3, timestamp=1316597913077, value=KOR
  mcxVnTAeAMQ==
  _f:squid_t:20110920162145_b:squid_s:204-2HONaSaoMEd 
column=location:country_name, timestamp=1316597913077, value=Korea, 
Republic of
  mcxVnTAeAMQ==
  _f:squid_t:20110920162145_b:squid_s:204-2HONaSaoMEd 
column=location:ip, timestamp=1316597913077, value=210.123.94.89
  mcxVnTAeAMQ==
  _f:squid_t:20110920162145_b:squid_s:204-2HONaSaoMEd 
column=location:latitude, timestamp=1316597913077, value=37.5664
  mcxVnTAeAMQ==
  _f:squid_t:20110920162145_b:squid_s:204-2HONaSaoMEd 
column=location:longitude, timestamp=1316597913077, value=126.9997
  mcxVnTAeAMQ==
  _f:squid_t:20110920162145_b:squid_s:204-2HONaSaoMEd 
column=location:metro_code, timestamp=1316597913077, value=
  mcxVnTAeAMQ==
  _f:squid_t:20110920162145_b:squid_s:204-2HONaSaoMEd 
column=location:postal_code, timestamp=1316597913077, value=
  mcxVnTAeAMQ==
  _f:squid_t:20110920162145_b:squid_s:204-2HONaSaoMEd 
column=location:region, timestamp=1316597913077, value=11
  mcxVnTAeAMQ==
1 row(s) in 3.4430 seconds



Le 21/09/2011 11:13, Damien Hardy a écrit :
> Hello,
>
> Ok I got a guilty one :
>
> I add
> LOG.info("putNext - tuple: " + i + ", value=" + t.get(i));
> at line 666 in src/org/apache/pig/backend/hadoop/hbase/HBaseStorage.java
>
> So I get in one of the jobs :
>
> [...]
> 2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 1, value=198.36.86.87
> 2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 2, value=US
> 2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 3, value=USA
> 2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 4, value=United States
> 2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 5, value=MI
> 2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 6, value=Manchester
> 2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 7, value=48158
> 2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 8, value=42.1616
> 2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 9, value=-84.0238
> 2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 10, value=734
> 2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 11, value=505
> 2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 1, value=210.123.94.89
> 2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 2, value=KR
> 2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 3, value=KOR
> 2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 4, value=Korea
> 2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 5, value= Republic of
> 2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 6, value=11
> 2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 7, value=Seoul
> 2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 8, value=null
> 2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 9, value=37.5664
> 2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 10, value=126.9997
> 2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 11, value=null
> 2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 12, value=null
> 2011-09-21 11:07:51,587 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
> 2011-09-21 11:07:51,589 WARN org.apache.hadoop.mapred.Child: Error running child
> java.lang.IndexOutOfBoundsException: Index: 11, Size: 11
> 	at java.util.ArrayList.RangeCheck(ArrayList.java:547)
> 	at java.util.ArrayList.get(ArrayList.java:322)
> 	at org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:667)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
> 	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:531)
> 	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:264)
> 2011-09-21 11:07:51,591 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task
>
>
> echo '210.123.94.89' | ./geostream.pl
> /usr/local/share/GeoIP/GeoLiteCity.dat
> 210.123.94.89,KR,Korea, Republic of,11,Seoul
>
> I have to use another delimiter ...
>
> Thank you,
>
> Regards,
>


Re: HBaseStorage STORE : IndexOutOfBoundsException

Posted by Damien Hardy <dh...@figarocms.fr>.
Hello,

Ok I got a guilty one :

I add
LOG.info("putNext - tuple: " + i + ", value=" + t.get(i));
at line 666 in src/org/apache/pig/backend/hadoop/hbase/HBaseStorage.java

So I get in one of the jobs :

[...]
2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 1, value=198.36.86.87
2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 2, value=US
2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 3, value=USA
2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 4, value=United States
2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 5, value=MI
2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 6, value=Manchester
2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 7, value=48158
2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 8, value=42.1616
2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 9, value=-84.0238
2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 10, value=734
2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 11, value=505
2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 1, value=210.123.94.89
2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 2, value=KR
2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 3, value=KOR
2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 4, value=Korea
2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 5, value= Republic of
2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 6, value=11
2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 7, value=Seoul
2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 8, value=null
2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 9, value=37.5664
2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 10, value=126.9997
2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 11, value=null
2011-09-21 11:07:51,584 INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage: putNext - tuple: 12, value=null
2011-09-21 11:07:51,587 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2011-09-21 11:07:51,589 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.IndexOutOfBoundsException: Index: 11, Size: 11
	at java.util.ArrayList.RangeCheck(ArrayList.java:547)
	at java.util.ArrayList.get(ArrayList.java:322)
	at org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:667)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:531)
	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
	at org.apache.hadoop.mapred.Child.main(Child.java:264)
2011-09-21 11:07:51,591 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task


echo '210.123.94.89' | ./geostream.pl 
/usr/local/share/GeoIP/GeoLiteCity.dat
210.123.94.89,KR,Korea, Republic of,11,Seoul

I have to use another delimiter ...

Thank you,

Regards,

-- 
Damien


Le 20/09/2011 23:35, Dmitriy Ryaboy a écrit :
> Can you dump D and examine it manually so see if there are cases when the
> number of columns is not the same as you expect?
>
> On Tue, Sep 20, 2011 at 9:02 AM, Damien Hardy<dh...@figarocms.fr>  wrote:
>
>> Hello,
>>
>> This is my pig script :
>> DEFINE iplookup `wrapper.sh GeoIP` ship ('wrapper.sh')
>> cache('/GeoIP/GeoIPcity.dat#**GeoIP') input (stdin using
>> PigStreaming(',')) output (stdout using PigStreaming(','));
>>
>> A = load 'log' using org.apache.pig.backend.hadoop.**
>> hbase.HBaseStorage('default:**body','-gt=_f:squid_t:20110920 -loadKey') AS
>> (rowkey, data);
>> B = FILTER A BY rowkey matches '.*_s:204-.*';
>> C = FOREACH B {
>>         t = REGEX_EXTRACT(data,'([0-9]{1,**3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\**.[0-9]{1,3}):([0-9]+)
>> ',1);
>>         generate rowkey, t;
>> }
>> D = STREAM C THROUGH iplookup;
>> STORE D INTO 'geoip_pig' USING org.apache.pig.backend.hadoop.**
>> hbase.HBaseStorage('location:**ip,location:country_code,**
>> location:country_code3,**location:country_name,**
>> location:region,location:city,**location:postal_code,location:**
>> latitude,location:longitude,**location:area_code,location:**metro_code');
>>
>>
>> There is 11 columns in my final table/columnFamily (STORE).
>>
>> I get some jobs (2/46) ending with :
>>
>> java.lang.**IndexOutOfBoundsException: Index: 11, Size: 11
>>         at java.util.ArrayList.**RangeCheck(ArrayList.java:547)
>>         at java.util.ArrayList.get(**ArrayList.java:322)
>>         at org.apache.pig.backend.hadoop.**hbase.HBaseStorage.putNext(**
>> HBaseStorage.java:666)
>>         at org.apache.pig.backend.hadoop.**executionengine.**
>> mapReduceLayer.**PigOutputFormat$**PigRecordWriter.write(**
>> PigOutputFormat.java:139)
>>         at org.apache.pig.backend.hadoop.**executionengine.**
>> mapReduceLayer.**PigOutputFormat$**PigRecordWriter.write(**
>> PigOutputFormat.java:98)
>>         at org.apache.hadoop.mapred.**MapTask$**NewDirectOutputCollector.**
>> write(MapTask.java:531)
>>         at org.apache.hadoop.mapreduce.**TaskInputOutputContext.write(**
>> TaskInputOutputContext.java:**80)
>>         at org.apache.pig.backend.hadoop.**executionengine.**
>> mapReduceLayer.PigMapOnly$Map.**collect(PigMapOnly.java:48)
>>         at org.apache.pig.backend.hadoop.**executionengine.**
>> mapReduceLayer.**PigGenericMapBase.runPipeline(**
>> PigGenericMapBase.java:269)
>>         at org.apache.pig.backend.hadoop.**executionengine.**
>> mapReduceLayer.**PigGenericMapBase.map(**PigGenericMapBase.java:262)
>>         at org.apache.pig.backend.hadoop.**executionengine.**
>> mapReduceLayer.**PigGenericMapBase.map(**PigGenericMapBase.java:64)
>>         at org.apache.hadoop.mapreduce.**Mapper.run(Mapper.java:144)
>>         at org.apache.hadoop.mapred.**MapTask.runNewMapper(MapTask.**
>> java:647)
>>         at org.apache.hadoop.mapred.**MapTask.run(MapTask.java:323)
>>         at org.apache.hadoop.mapred.**Child$4.run(Child.java:270)
>>         at java.security.**AccessController.doPrivileged(**Native Method)
>>         at javax.security.auth.Subject.**doAs(Subject.java:396)
>>         at org.apache.hadoop.security.**UserGroupInformation.doAs(**
>> UserGroupInformation.java:**1127)
>>         at org.apache.hadoop.mapred.**Child.main(Child.java:264)
>>
>> Most the jobs them ended successfully.
>>
>> In src/org/apache/pig/backend/**hadoop/hbase/HBaseStorage.java around line
>> 666 ( Damn ! )
>>
>>         for (int i=1;i<   t.size();++i){
>>             ColumnInfo columnInfo = columnInfo_.get(i-1);
>>             if (LOG.isDebugEnabled()) {
>>                 LOG.debug("putNext - tuple: " + i + ", value=" + t.get(i) +
>>                         ", cf:column=" + columnInfo);
>>             }
>>
>>
>> Is it possible that columnInfo_ and t are not the same size ? In which case
>> ?
>>
>> Regards,
>>
>> --
>> Damien
>>
>>
>>
>>


Re: HBaseStorage STORE : IndexOutOfBoundsException

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Can you dump D and examine it manually so see if there are cases when the
number of columns is not the same as you expect?

On Tue, Sep 20, 2011 at 9:02 AM, Damien Hardy <dh...@figarocms.fr> wrote:

> Hello,
>
> This is my pig script :
> DEFINE iplookup `wrapper.sh GeoIP` ship ('wrapper.sh')
> cache('/GeoIP/GeoIPcity.dat#**GeoIP') input (stdin using
> PigStreaming(',')) output (stdout using PigStreaming(','));
>
> A = load 'log' using org.apache.pig.backend.hadoop.**
> hbase.HBaseStorage('default:**body','-gt=_f:squid_t:20110920 -loadKey') AS
> (rowkey, data);
> B = FILTER A BY rowkey matches '.*_s:204-.*';
> C = FOREACH B {
>        t = REGEX_EXTRACT(data,'([0-9]{1,**3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\**.[0-9]{1,3}):([0-9]+)
> ',1);
>        generate rowkey, t;
> }
> D = STREAM C THROUGH iplookup;
> STORE D INTO 'geoip_pig' USING org.apache.pig.backend.hadoop.**
> hbase.HBaseStorage('location:**ip,location:country_code,**
> location:country_code3,**location:country_name,**
> location:region,location:city,**location:postal_code,location:**
> latitude,location:longitude,**location:area_code,location:**metro_code');
>
>
> There is 11 columns in my final table/columnFamily (STORE).
>
> I get some jobs (2/46) ending with :
>
> java.lang.**IndexOutOfBoundsException: Index: 11, Size: 11
>        at java.util.ArrayList.**RangeCheck(ArrayList.java:547)
>        at java.util.ArrayList.get(**ArrayList.java:322)
>        at org.apache.pig.backend.hadoop.**hbase.HBaseStorage.putNext(**
> HBaseStorage.java:666)
>        at org.apache.pig.backend.hadoop.**executionengine.**
> mapReduceLayer.**PigOutputFormat$**PigRecordWriter.write(**
> PigOutputFormat.java:139)
>        at org.apache.pig.backend.hadoop.**executionengine.**
> mapReduceLayer.**PigOutputFormat$**PigRecordWriter.write(**
> PigOutputFormat.java:98)
>        at org.apache.hadoop.mapred.**MapTask$**NewDirectOutputCollector.**
> write(MapTask.java:531)
>        at org.apache.hadoop.mapreduce.**TaskInputOutputContext.write(**
> TaskInputOutputContext.java:**80)
>        at org.apache.pig.backend.hadoop.**executionengine.**
> mapReduceLayer.PigMapOnly$Map.**collect(PigMapOnly.java:48)
>        at org.apache.pig.backend.hadoop.**executionengine.**
> mapReduceLayer.**PigGenericMapBase.runPipeline(**
> PigGenericMapBase.java:269)
>        at org.apache.pig.backend.hadoop.**executionengine.**
> mapReduceLayer.**PigGenericMapBase.map(**PigGenericMapBase.java:262)
>        at org.apache.pig.backend.hadoop.**executionengine.**
> mapReduceLayer.**PigGenericMapBase.map(**PigGenericMapBase.java:64)
>        at org.apache.hadoop.mapreduce.**Mapper.run(Mapper.java:144)
>        at org.apache.hadoop.mapred.**MapTask.runNewMapper(MapTask.**
> java:647)
>        at org.apache.hadoop.mapred.**MapTask.run(MapTask.java:323)
>        at org.apache.hadoop.mapred.**Child$4.run(Child.java:270)
>        at java.security.**AccessController.doPrivileged(**Native Method)
>        at javax.security.auth.Subject.**doAs(Subject.java:396)
>        at org.apache.hadoop.security.**UserGroupInformation.doAs(**
> UserGroupInformation.java:**1127)
>        at org.apache.hadoop.mapred.**Child.main(Child.java:264)
>
> Most the jobs them ended successfully.
>
> In src/org/apache/pig/backend/**hadoop/hbase/HBaseStorage.java around line
> 666 ( Damn ! )
>
>        for (int i=1;i<  t.size();++i){
>            ColumnInfo columnInfo = columnInfo_.get(i-1);
>            if (LOG.isDebugEnabled()) {
>                LOG.debug("putNext - tuple: " + i + ", value=" + t.get(i) +
>                        ", cf:column=" + columnInfo);
>            }
>
>
> Is it possible that columnInfo_ and t are not the same size ? In which case
> ?
>
> Regards,
>
> --
> Damien
>
>
>
>