You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by afancy <gr...@gmail.com> on 2012/10/03 10:20:14 UTC

What does ROW__OFFSET__INSIDE__BLOCK FROM mean?

Hi,

Could anybody explain me what ROW__OFFSET__INSIDE__BLOCK means?
For example, I make the following query, and return two rows. But why does
the column of ROW__OFFSET__INSIDE__BLOCK show 0?
For my understanding from the name of column, it should return the line
number of the records in the block files, but now both are 0.  So, what is
the BLOCK, BLOCK offset, and row offset in a block?
The Hive bitmap document is very confusing.


hive> SELECT  `url`,  INPUT__FILE__NAME,BLOCK__OFFSET__INSIDE__FILE,
ROW__OFFSET__INSIDE__BLOCK FROM `testresult` WHERE url='
http://www.domain022.tl04/page035.html';

http://www.domain022.tl04/page035.html
hdfs://pc01:54310/user/hive/warehouse/testresult/testresults.csv 0 0
http://www.domain022.tl04/page035.html  hdfs://pc01:54310/
user/hive/warehouse/testresult/testresults.csv 3200250 0
Time taken: 19.653 seconds
hive>


Regards,
afancy

Re: What does ROW__OFFSET__INSIDE__BLOCK FROM mean?

Posted by Navis류승우 <na...@nexr.com>.
It seemed that ROW__OFFSET__INSIDE__BLOCK is meaningful only with
SequenceFileFormat (+block compression) or RCFileFormat.

2012/10/3 Edward Capriolo <ed...@gmail.com>

> Make sure virtual column support is turned on in your hive-site.xml. I
> have a feeling that this field is only supported inside certain input
> formats because I was unable to get a non-very number out of it. (I
> think it only works with index files)
>
> On Wed, Oct 3, 2012 at 4:20 AM, afancy <gr...@gmail.com> wrote:
> > Hi,
> >
> > Could anybody explain me what ROW__OFFSET__INSIDE__BLOCK means?
> > For example, I make the following query, and return two rows. But why
> does
> > the column of ROW__OFFSET__INSIDE__BLOCK show 0?
> > For my understanding from the name of column, it should return the line
> > number of the records in the block files, but now both are 0.  So, what
> is
> > the BLOCK, BLOCK offset, and row offset in a block?
> > The Hive bitmap document is very confusing.
> >
> >
> > hive> SELECT  `url`,  INPUT__FILE__NAME,BLOCK__OFFSET__INSIDE__FILE,
> > ROW__OFFSET__INSIDE__BLOCK FROM `testresult` WHERE
> > url='http://www.domain022.tl04/page035.html';
> >
> > http://www.domain022.tl04/page035.html
> > hdfs://pc01:54310/user/hive/warehouse/testresult/testresults.csv 0 0
> > http://www.domain022.tl04/page035.html
> > hdfs://pc01:54310/user/hive/warehouse/testresult/testresults.csv 3200250
> 0
> > Time taken: 19.653 seconds
> > hive>
> >
> >
> > Regards,
> > afancy
> >
>

Re: What does ROW__OFFSET__INSIDE__BLOCK FROM mean?

Posted by Edward Capriolo <ed...@gmail.com>.
Make sure virtual column support is turned on in your hive-site.xml. I
have a feeling that this field is only supported inside certain input
formats because I was unable to get a non-very number out of it. (I
think it only works with index files)

On Wed, Oct 3, 2012 at 4:20 AM, afancy <gr...@gmail.com> wrote:
> Hi,
>
> Could anybody explain me what ROW__OFFSET__INSIDE__BLOCK means?
> For example, I make the following query, and return two rows. But why does
> the column of ROW__OFFSET__INSIDE__BLOCK show 0?
> For my understanding from the name of column, it should return the line
> number of the records in the block files, but now both are 0.  So, what is
> the BLOCK, BLOCK offset, and row offset in a block?
> The Hive bitmap document is very confusing.
>
>
> hive> SELECT  `url`,  INPUT__FILE__NAME,BLOCK__OFFSET__INSIDE__FILE,
> ROW__OFFSET__INSIDE__BLOCK FROM `testresult` WHERE
> url='http://www.domain022.tl04/page035.html';
>
> http://www.domain022.tl04/page035.html
> hdfs://pc01:54310/user/hive/warehouse/testresult/testresults.csv 0 0
> http://www.domain022.tl04/page035.html
> hdfs://pc01:54310/user/hive/warehouse/testresult/testresults.csv 3200250 0
> Time taken: 19.653 seconds
> hive>
>
>
> Regards,
> afancy
>