You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by 杨海乐 <ya...@letv.com> on 2015/11/27 04:03:27 UTC

String index out of range

Hello,
At the build base cuboid data period, The Exception is thrown and reduce tasks received the signal of task kill.
Error: java.lang.StringIndexOutOfBoundsException: String index out of range: 7595
at java.lang.String.checkBounds(String.java:375)
at java.lang.String.<init>(String.java:452)
at org.apache.kylin.common.util.Bytes.toString(Bytes.java:375)
at org.apache.kylin.common.util.BytesSplitter.toString(BytesSplitter.java:132)
at java.lang.String.valueOf(String.java:2849)
at java.lang.StringBuilder.append(StringBuilder.java:128)
at org.apache.kylin.job.hadoop.cube.BaseCuboidMapper.handleErrorRecord(BaseCuboidMapper.java:229)
at org.apache.kylin.job.hadoop.cube.BaseCuboidMapper.map(BaseCuboidMapper.java:223)
at org.apache.kylin.job.hadoop.cube.BaseCuboidMapper.map(BaseCuboidMapper.java:55)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)


Re: String index out of range

Posted by 杨海乐 <ya...@letv.com>.
Thank hongbin, I want to know which version We resolve the problem since



--
View this message in context: http://apache-kylin-incubating.74782.x6.nabble.com/String-index-out-of-range-tp2538p2628.html
Sent from the Apache Kylin (Incubating) mailing list archive at Nabble.com.

Re: String index out of range

Posted by hongbin ma <ma...@apache.org>.
@haile, ​you're understanding it wrong.
4096 is a limit on column size, not row size, you can check the code ​
BaseCuboidMapperBase
​:86


On Fri, Nov 27, 2015 at 4:55 PM, 杨海乐 <ya...@letv.com> wrote:

> As Shaofeng said ,It looks like the limit of a row of a intermediate hive
> table that consists of measures and dimensions  is 4k,not a column.
>
>
>
> --
> View this message in context:
> http://apache-kylin-incubating.74782.x6.nabble.com/String-index-out-of-range-tp2538p2575.html
> Sent from the Apache Kylin (Incubating) mailing list archive at Nabble.com.
>



-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Re: String index out of range

Posted by 杨海乐 <ya...@letv.com>.
As Shaofeng said ,It looks like the limit of a row of a intermediate hive
table that consists of measures and dimensions  is 4k,not a column.



--
View this message in context: http://apache-kylin-incubating.74782.x6.nabble.com/String-index-out-of-range-tp2538p2575.html
Sent from the Apache Kylin (Incubating) mailing list archive at Nabble.com.

Re: String index out of range

Posted by yu feng <ol...@gmail.com>.
In my opinion, it is because one column in one row is too long, it lead to
map function throwing exception, and when printing error message(in
handleErrorRecord function), the StringIndexOutOfBoundsException is throwed
because the length of one column is longer than allocated 4096 bytes.

2015-11-27 16:06 GMT+08:00 杨海乐 <ya...@letv.com>:

> is the problem caused by the content length of some rows that consist of
> dimensions in intermediate hive table ?
>
>
>
> --
> View this message in context:
> http://apache-kylin-incubating.74782.x6.nabble.com/String-index-out-of-range-tp2538p2570.html
> Sent from the Apache Kylin (Incubating) mailing list archive at Nabble.com.
>

Re: String index out of range

Posted by 杨海乐 <ya...@letv.com>.
is the problem caused by the content length of some rows that consist of 
dimensions in intermediate hive table ? 



--
View this message in context: http://apache-kylin-incubating.74782.x6.nabble.com/String-index-out-of-range-tp2538p2570.html
Sent from the Apache Kylin (Incubating) mailing list archive at Nabble.com.

Re: String index out of range

Posted by yu feng <ol...@gmail.com>.
you can refer to :https://issues.apache.org/jira/browse/KYLIN-1104

2015-11-27 13:14 GMT+08:00 杨海乐 <ya...@letv.com>:

> I can't understand your words,Can you tell me more about the problem
> ,Thanks.
>
>
>
> --
> View this message in context:
> http://apache-kylin-incubating.74782.x6.nabble.com/String-index-out-of-range-tp2538p2553.html
> Sent from the Apache Kylin (Incubating) mailing list archive at Nabble.com.
>

Re: String index out of range

Posted by 杨海乐 <ya...@letv.com>.
I can't understand your words,Can you tell me more about the problem ,Thanks.



--
View this message in context: http://apache-kylin-incubating.74782.x6.nabble.com/String-index-out-of-range-tp2538p2553.html
Sent from the Apache Kylin (Incubating) mailing list archive at Nabble.com.

Re: String index out of range

Posted by "Shi, Shaofeng" <sh...@ebay.com>.
In the creating intermediate flat table step, only the dimension/measure
columns will be kept; So if one line couldn’t be fit into 4k space in
building the base cuboid, the user need to pay attention to the raw data’s
quality.

On 11/27/15, 12:51 PM, "Li Yang" <li...@apache.org> wrote:

>The 4096 limit can increase.
>
>I start to feel make sense to give it a bigger value. The buffer is used
>to
>read in a row from hive table, notice not all the columns from hive table
>will become cube dimension. It's absolutely possible to cube 5 dimensions
>off a 100 columns hive table.
>
>
>
>On Fri, Nov 27, 2015 at 12:14 PM, hongbin ma <ma...@apache.org> wrote:
>
>> maybe we can parameterize the max col size, if that's necessary
>>
>> On Fri, Nov 27, 2015 at 11:26 AM, yu feng <ol...@gmail.com> wrote:
>>
>> > ha-ha, I have encountered this problem and I can not change my source
>> data
>> > in hive just like RDS, so I modify some code... this is really a
>>trouble
>> > when encounter a such dimension values
>> >
>> > 2015-11-27 11:05 GMT+08:00 hongbin ma <ma...@apache.org>:
>> >
>> > > BaseCuboidMapperBase
>> > > ​:86
>> > >
>> > >  bytesSplitter = new BytesSplitter(200, 4096);
>> > >
>> > > max length for each column is 4096, actually it does not make a lot
>>of
>> > > sense for such large columns​
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > --
>> > > Regards,
>> > >
>> > > *Bin Mahone | 马洪宾*
>> > > Apache Kylin: http://kylin.io
>> > > Github: https://github.com/binmahone
>> > >
>> >
>>
>>
>>
>> --
>> Regards,
>>
>> *Bin Mahone | 马洪宾*
>> Apache Kylin: http://kylin.io
>> Github: https://github.com/binmahone
>>


Re: String index out of range

Posted by Li Yang <li...@apache.org>.
I increased the limit to 16384 in 1.x and 2.x branches.

On Fri, Nov 27, 2015 at 12:51 PM, Li Yang <li...@apache.org> wrote:

> The 4096 limit can increase.
>
> I start to feel make sense to give it a bigger value. The buffer is used
> to read in a row from hive table, notice not all the columns from hive
> table will become cube dimension. It's absolutely possible to cube 5
> dimensions off a 100 columns hive table.
>
>
>
> On Fri, Nov 27, 2015 at 12:14 PM, hongbin ma <ma...@apache.org> wrote:
>
>> maybe we can parameterize the max col size, if that's necessary
>>
>> On Fri, Nov 27, 2015 at 11:26 AM, yu feng <ol...@gmail.com> wrote:
>>
>> > ha-ha, I have encountered this problem and I can not change my source
>> data
>> > in hive just like RDS, so I modify some code... this is really a trouble
>> > when encounter a such dimension values
>> >
>> > 2015-11-27 11:05 GMT+08:00 hongbin ma <ma...@apache.org>:
>> >
>> > > BaseCuboidMapperBase
>> > > ​:86
>> > >
>> > >  bytesSplitter = new BytesSplitter(200, 4096);
>> > >
>> > > max length for each column is 4096, actually it does not make a lot of
>> > > sense for such large columns​
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > --
>> > > Regards,
>> > >
>> > > *Bin Mahone | 马洪宾*
>> > > Apache Kylin: http://kylin.io
>> > > Github: https://github.com/binmahone
>> > >
>> >
>>
>>
>>
>> --
>> Regards,
>>
>> *Bin Mahone | 马洪宾*
>> Apache Kylin: http://kylin.io
>> Github: https://github.com/binmahone
>>
>
>

Re: String index out of range

Posted by Li Yang <li...@apache.org>.
The 4096 limit can increase.

I start to feel make sense to give it a bigger value. The buffer is used to
read in a row from hive table, notice not all the columns from hive table
will become cube dimension. It's absolutely possible to cube 5 dimensions
off a 100 columns hive table.



On Fri, Nov 27, 2015 at 12:14 PM, hongbin ma <ma...@apache.org> wrote:

> maybe we can parameterize the max col size, if that's necessary
>
> On Fri, Nov 27, 2015 at 11:26 AM, yu feng <ol...@gmail.com> wrote:
>
> > ha-ha, I have encountered this problem and I can not change my source
> data
> > in hive just like RDS, so I modify some code... this is really a trouble
> > when encounter a such dimension values
> >
> > 2015-11-27 11:05 GMT+08:00 hongbin ma <ma...@apache.org>:
> >
> > > BaseCuboidMapperBase
> > > ​:86
> > >
> > >  bytesSplitter = new BytesSplitter(200, 4096);
> > >
> > > max length for each column is 4096, actually it does not make a lot of
> > > sense for such large columns​
> > >
> > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > >
> > > *Bin Mahone | 马洪宾*
> > > Apache Kylin: http://kylin.io
> > > Github: https://github.com/binmahone
> > >
> >
>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>

Re: String index out of range

Posted by hongbin ma <ma...@apache.org>.
maybe we can parameterize the max col size, if that's necessary

On Fri, Nov 27, 2015 at 11:26 AM, yu feng <ol...@gmail.com> wrote:

> ha-ha, I have encountered this problem and I can not change my source data
> in hive just like RDS, so I modify some code... this is really a trouble
> when encounter a such dimension values
>
> 2015-11-27 11:05 GMT+08:00 hongbin ma <ma...@apache.org>:
>
> > BaseCuboidMapperBase
> > ​:86
> >
> >  bytesSplitter = new BytesSplitter(200, 4096);
> >
> > max length for each column is 4096, actually it does not make a lot of
> > sense for such large columns​
> >
> >
> >
> >
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> > Apache Kylin: http://kylin.io
> > Github: https://github.com/binmahone
> >
>



-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Re: String index out of range

Posted by yu feng <ol...@gmail.com>.
ha-ha, I have encountered this problem and I can not change my source data
in hive just like RDS, so I modify some code... this is really a trouble
when encounter a such dimension values

2015-11-27 11:05 GMT+08:00 hongbin ma <ma...@apache.org>:

> BaseCuboidMapperBase
> ​:86
>
>  bytesSplitter = new BytesSplitter(200, 4096);
>
> max length for each column is 4096, actually it does not make a lot of
> sense for such large columns​
>
>
>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>

Re: String index out of range

Posted by hongbin ma <ma...@apache.org>.
BaseCuboidMapperBase
​:86

 bytesSplitter = new BytesSplitter(200, 4096);

max length for each column is 4096, actually it does not make a lot of
sense for such large columns​





-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone