You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@orc.apache.org by Korry Douglas <ko...@me.com> on 2019/03/04 16:52:05 UTC

Question about using indexes/statistics

Hi all, I am trying to implement predicate-pushdown in my PostgreSQL->ORC foreign data wrapper (FDW). 

If I understand correctly, I should be able to:

1) skip the entire file based on the statistics returned by Reader::getStatistics()

2) skip individual stripes based on the statistics returned by Reader::getStripeStatistics(stripeNumber)

3) skip individual row groups based on the stats returned by StripeStatistics::getRowIndexStatistics(columnId, indexId)

Is that correct so far? I understand that will have to apply the predicate expression to the min/max value for each of the above-mentioned stats.

I’m having trouble figuring out how to actually skip around within the file.

The file that I am experimenting with is the demo-11-none.orc file from the examples directory.

Here are the details of this file:

stt=# SELECT * FROM orc_get_file_info('/work/stt/orc/examples/demo-11-none.orc');
┌─[ RECORD 1 ]───────────┬───────────┐
│ file_size              │ 5147970   │
│ columns                │ 10        │
│ rows                   │ 1920800   │
│ stripes                │ 385       │
│ version                │ 0.11      │
│ writer_id              │ orc_java  │
│ compression_kind       │ none      │
│ writer_version         │ HIVE-8732 │
│ row_index_stride       │ 10000     │
│ stripe_stat_count      │ 385       │
│ content_length         │ 5069718   │
│ strip_stats_length     │ 70748     │
│ file_footer_length     │ 7481      │
│ file_postscript_length │ 22        │
└────────────────────────┴───────────┘

I’m using a RowReader to scan my way through the file - when I call RowReader::createRowBatch(), I specify a batch size of 1000 rows.

I can figure out how to use the stats returned by Reader::getStatistics() - I’ll just skip to the next file of the predicate expression cannot be satisfied according to those stats.

I can read the stats for each stripe and can evaluate the predicate.  But my question is, how do I skip to the next stripe?

Do I just do something like this: 

if (predicateRefutes(stripeStats))
{
   rowReader->seekToRow(rowReader->getRowNumber() + stripeInfo->getNumerOfRows());
   rowReader->next(myColumnVectorBatch);
}

In other words, is seekToRow() the proper way to move from stripe to stripe?  

Does the batch size (1000 rows in my case) have anything to do with this?

Thanks.
    
             — Korry


Re: Question about using indexes/statistics

Posted by Korry Douglas <ko...@me.com>.
It sure does!  Now it all makes sense - I can’t believe I missed that even when I was looking right at the inheritance diagram.

Apologies and thanks.


               — Korry

> On Mar 6, 2019, at 6:09 PM, ustcwg@gmail.com wrote:
> 
> StripeStatistics inherits Statistics, therefore it has getColumnStatistics function as well to return the stripe level stats that you want.
> 
> Sent from my iPhone
> 
>> On Mar 6, 2019, at 14:59, Korry Douglas <ko...@me.com> wrote:
>> 
>> StripeStatistics


Re: Question about using indexes/statistics

Posted by us...@gmail.com.
StripeStatistics inherits Statistics, therefore it has getColumnStatistics function as well to return the stripe level stats that you want.

Sent from my iPhone

> On Mar 6, 2019, at 14:59, Korry Douglas <ko...@me.com> wrote:
> 
> StripeStatistics

Re: Question about using indexes/statistics

Posted by Korry Douglas <ko...@me.com>.
I understand that part, thanks.  

What I’m confused about is that I expected the following:

1 - The file level stats would contain the min and max values for each column in the entire file
2 - The stripe level stats would contain the min and max values for a given stripe
3 - The rowgroup stats would contain the min and max values for a row group within a stripe

What seems to be missing is #2 - there seems to be no StripeStatistics::getColumnStatistics() function.  

The only member functions in StripeStatistics are:
getNumberOfRowIndexStats()
getRowIndexStatistics()


Am I missing something?

                 — Korry

> On Mar 6, 2019, at 4:44 PM, Gang Wu <ga...@apache.org> wrote:
> 
> The following function returns the stripe-level & row-group-level statistics of the stripe specified by input. 
> ORC_UNIQUE_PTR<StripeStatistics> Reader::getStripeStatistics(uint64_t stripeIndex) const;
> You need to call StripeStatistics::getColumnStatistics to get stripe-level stats and StripeStatistics::getRowIndexStatistics to get row-group-level stats.
> 
> Gang
> 
> 
> 
> On Wed, Mar 6, 2019 at 9:26 AM Korry Douglas <korry@me.com <ma...@me.com>> wrote:
> Actually, I don’t think I got it quite right.  I wrote:
> 
>> If I understand correctly, I should be able to:
>> 
>> 1) skip the entire file based on the statistics returned by Reader::getStatistics()
>> 
>> 2) skip individual stripes based on the statistics returned by Reader::getStripeStatistics(stripeNumber)
>> 
>> 3) skip individual row groups based on the stats returned by StripeStatistics::getRowIndexStatistics(columnId, indexId)
> 
> 
> But it looks like there are not really three levels of stats.
> 
> Reader::getStatistics() will tell me whether I can skip the entire file (because the predicates are refuted by the stats)
> 
> But Reader::getStripeStatistics(stripeNumber) returns a pointer to a StripeStatistics object. Using that pointer, I can fetch the stats for each row group in the stripe, but there is no overall summary that gives me the min and max for the entire stripe.
> 
> So if I have 5 stripes, each containing 10 row groups, I can’t skip any given stripe without evaluating 10 sets of statistics (one for each of the 10 row groups in the stripe).  Is that correct?
> 
> Are the row group statistics (row index stats) stored in the file footer or will I be doing a lot of seeking within the file in order to find those 500 stats objects?
> 
> Thanks again for the help.
> 
> 
>                 — Korry
> 
> 
> 
>> On Mar 4, 2019, at 12:19 PM, ustcwg@gmail.com <ma...@gmail.com> wrote:
>> 
>> Hi Korry,
>> 
>> Yes you are right. You can get stats of file stripe and row group level using those interfaces and then implement your predicate push down logic. seekToRow is the right function to call to skip to stripe or row group. What you need do is to compute the start of stripe or row group based on the offset in file footer & stripe footer for your target stripe or positions in the index streams in the stripe for target row group. 
>> 
>> The batch size is just a hint to get rows of data. But the reader will try to read the size you want but it will stop at the stripe boundary.
>> 
>> Best,
>> Gang
>> 
>> Sent from my iPhone
>> 
>> On Mar 4, 2019, at 08:52, Korry Douglas <korry@me.com <ma...@me.com>> wrote:
>> 
>>> Hi all, I am trying to implement predicate-pushdown in my PostgreSQL->ORC foreign data wrapper (FDW). 
>>> 
>>> If I understand correctly, I should be able to:
>>> 
>>> 1) skip the entire file based on the statistics returned by Reader::getStatistics()
>>> 
>>> 2) skip individual stripes based on the statistics returned by Reader::getStripeStatistics(stripeNumber)
>>> 
>>> 3) skip individual row groups based on the stats returned by StripeStatistics::getRowIndexStatistics(columnId, indexId)
>>> 
>>> Is that correct so far? I understand that will have to apply the predicate expression to the min/max value for each of the above-mentioned stats.
>>> 
>>> I’m having trouble figuring out how to actually skip around within the file.
>>> 
>>> The file that I am experimenting with is the demo-11-none.orc file from the examples directory.
>>> 
>>> Here are the details of this file:
>>> 
>>> stt=# SELECT * FROM orc_get_file_info('/work/stt/orc/examples/demo-11-none.orc');
>>> ┌─[ RECORD 1 ]───────────┬───────────┐
>>> │ file_size              │ 5147970   │
>>> │ columns                │ 10        │
>>> │ rows                   │ 1920800   │
>>> │ stripes                │ 385       │
>>> │ version                │ 0.11      │
>>> │ writer_id              │ orc_java  │
>>> │ compression_kind       │ none      │
>>> │ writer_version         │ HIVE-8732 │
>>> │ row_index_stride       │ 10000     │
>>> │ stripe_stat_count      │ 385       │
>>> │ content_length         │ 5069718   │
>>> │ strip_stats_length     │ 70748     │
>>> │ file_footer_length     │ 7481      │
>>> │ file_postscript_length │ 22        │
>>> └────────────────────────┴───────────┘
>>> 
>>> I’m using a RowReader to scan my way through the file - when I call RowReader::createRowBatch(), I specify a batch size of 1000 rows.
>>> 
>>> I can figure out how to use the stats returned by Reader::getStatistics() - I’ll just skip to the next file of the predicate expression cannot be satisfied according to those stats.
>>> 
>>> I can read the stats for each stripe and can evaluate the predicate.  But my question is, how do I skip to the next stripe?
>>> 
>>> Do I just do something like this: 
>>> 
>>> if (predicateRefutes(stripeStats))
>>> {
>>>    rowReader->seekToRow(rowReader->getRowNumber() + stripeInfo->getNumerOfRows());
>>>    rowReader->next(myColumnVectorBatch);
>>> }
>>> 
>>> In other words, is seekToRow() the proper way to move from stripe to stripe?  
>>> 
>>> Does the batch size (1000 rows in my case) have anything to do with this?
>>> 
>>> Thanks.
>>>     
>>>              — Korry
>>> 
> 


Re: Question about using indexes/statistics

Posted by Gang Wu <ga...@apache.org>.
The following function returns the stripe-level & row-group-level
statistics of the stripe specified by input.
*ORC_UNIQUE_PTR<StripeStatistics> Reader::getStripeStatistics(uint64_t
stripeIndex) const;*
You need to call *StripeStatistics::getColumnStatistics *to get
stripe-level stats and *StripeStatistics::getRowIndexStatistics *to get
row-group-level stats.

Gang



On Wed, Mar 6, 2019 at 9:26 AM Korry Douglas <ko...@me.com> wrote:

> Actually, I don’t think I got it quite right.  I wrote:
>
> If I understand correctly, I should be able to:
>
> 1) skip the entire file based on the statistics returned by
> Reader::getStatistics()
>
> 2) skip individual stripes based on the statistics returned by
> Reader::getStripeStatistics(*stripeNumber*)
>
> 3) skip individual row groups based on the stats returned by
> StripeStatistics::getRowIndexStatistics(*columnId, indexId*)
>
>
>
> But it looks like there are not really three levels of stats.
>
> Reader::getStatistics() will tell me whether I can skip the entire file
> (because the predicates are refuted by the stats)
>
> But Reader::getStripeStatistics(*stripeNumber*) returns a pointer to a
> StripeStatistics object. Using that pointer, I can fetch the stats for each
> row group in the stripe, but there is no overall summary that gives me the
> min and max for the entire stripe.
>
> So if I have 5 stripes, each containing 10 row groups, I can’t skip any
> given stripe without evaluating 10 sets of statistics (one for each of the
> 10 row groups in the stripe).  Is that correct?
>
> Are the row group statistics (row index stats) stored in the file footer
> or will I be doing a lot of seeking within the file in order to find those
> 500 stats objects?
>
> Thanks again for the help.
>
>
>                 — Korry
>
>
>
> On Mar 4, 2019, at 12:19 PM, ustcwg@gmail.com wrote:
>
> Hi Korry,
>
> Yes you are right. You can get stats of file stripe and row group level
> using those interfaces and then implement your predicate push down logic.
> seekToRow is the right function to call to skip to stripe or row group.
> What you need do is to compute the start of stripe or row group based on
> the offset in file footer & stripe footer for your target stripe or
> positions in the index streams in the stripe for target row group.
>
> The batch size is just a hint to get rows of data. But the reader will try
> to read the size you want but it will stop at the stripe boundary.
>
> Best,
> Gang
>
> Sent from my iPhone
>
> On Mar 4, 2019, at 08:52, Korry Douglas <ko...@me.com> wrote:
>
> Hi all, I am trying to implement predicate-pushdown in my PostgreSQL->ORC
> foreign data wrapper (FDW).
>
> If I understand correctly, I should be able to:
>
> 1) skip the entire file based on the statistics returned by
> Reader::getStatistics()
>
> 2) skip individual stripes based on the statistics returned by
> Reader::getStripeStatistics(*stripeNumber*)
>
> 3) skip individual row groups based on the stats returned by
> StripeStatistics::getRowIndexStatistics(*columnId, indexId*)
>
>
> Is that correct so far? I understand that will have to apply the predicate
> expression to the min/max value for each of the above-mentioned stats.
>
> I’m having trouble figuring out how to actually skip around within the
> file.
>
> The file that I am experimenting with is the demo-11-none.orc file from
> the examples directory.
>
> Here are the details of this file:
>
> stt=# *SELECT * FROM
> orc_get_file_info('/work/stt/orc/examples/demo-11-none.orc');*
> ┌─[ RECORD 1 ]───────────┬───────────┐
> │ file_size              │ 5147970   │
> │ columns                │ 10        │
> │ rows                   │ 1920800   │
> │ stripes                │ 385       │
> │ version                │ 0.11      │
> │ writer_id              │ orc_java  │
> │ compression_kind       │ none      │
> │ writer_version         │ HIVE-8732 │
> │ row_index_stride       │ 10000     │
> │ stripe_stat_count      │ 385       │
> │ content_length         │ 5069718   │
> │ strip_stats_length     │ 70748     │
> │ file_footer_length     │ 7481      │
> │ file_postscript_length │ 22        │
> └────────────────────────┴───────────┘
>
>
> I’m using a RowReader to scan my way through the file - when I call
> RowReader::createRowBatch(), I specify a batch size of 1000 rows.
>
> I can figure out how to use the stats returned by Reader::getStatistics()
> - I’ll just skip to the next file of the predicate expression cannot be
> satisfied according to those stats.
>
> I can read the stats for each stripe and can evaluate the predicate.  But
> my question is, how do I skip to the next stripe?
>
> Do I just do something like this:
>
> if (predicateRefutes(stripeStats))
>
> {
>    rowReader->seekToRow(rowReader->getRowNumber() +
> stripeInfo->getNumerOfRows());
>    rowReader->next(myColumnVectorBatch);
> }
>
> In other words, is seekToRow() the proper way to move from stripe to
> stripe?
>
> Does the batch size (1000 rows in my case) have anything to do with this?
>
> Thanks.
>
>              — Korry
>
>
>

Re: Question about using indexes/statistics

Posted by Korry Douglas <ko...@me.com>.
Actually, I don’t think I got it quite right.  I wrote:

> If I understand correctly, I should be able to:
> 
> 1) skip the entire file based on the statistics returned by Reader::getStatistics()
> 
> 2) skip individual stripes based on the statistics returned by Reader::getStripeStatistics(stripeNumber)
> 
> 3) skip individual row groups based on the stats returned by StripeStatistics::getRowIndexStatistics(columnId, indexId)


But it looks like there are not really three levels of stats.

Reader::getStatistics() will tell me whether I can skip the entire file (because the predicates are refuted by the stats)

But Reader::getStripeStatistics(stripeNumber) returns a pointer to a StripeStatistics object. Using that pointer, I can fetch the stats for each row group in the stripe, but there is no overall summary that gives me the min and max for the entire stripe.

So if I have 5 stripes, each containing 10 row groups, I can’t skip any given stripe without evaluating 10 sets of statistics (one for each of the 10 row groups in the stripe).  Is that correct?

Are the row group statistics (row index stats) stored in the file footer or will I be doing a lot of seeking within the file in order to find those 500 stats objects?

Thanks again for the help.


                — Korry



> On Mar 4, 2019, at 12:19 PM, ustcwg@gmail.com wrote:
> 
> Hi Korry,
> 
> Yes you are right. You can get stats of file stripe and row group level using those interfaces and then implement your predicate push down logic. seekToRow is the right function to call to skip to stripe or row group. What you need do is to compute the start of stripe or row group based on the offset in file footer & stripe footer for your target stripe or positions in the index streams in the stripe for target row group. 
> 
> The batch size is just a hint to get rows of data. But the reader will try to read the size you want but it will stop at the stripe boundary.
> 
> Best,
> Gang
> 
> Sent from my iPhone
> 
> On Mar 4, 2019, at 08:52, Korry Douglas <korry@me.com <ma...@me.com>> wrote:
> 
>> Hi all, I am trying to implement predicate-pushdown in my PostgreSQL->ORC foreign data wrapper (FDW). 
>> 
>> If I understand correctly, I should be able to:
>> 
>> 1) skip the entire file based on the statistics returned by Reader::getStatistics()
>> 
>> 2) skip individual stripes based on the statistics returned by Reader::getStripeStatistics(stripeNumber)
>> 
>> 3) skip individual row groups based on the stats returned by StripeStatistics::getRowIndexStatistics(columnId, indexId)
>> 
>> Is that correct so far? I understand that will have to apply the predicate expression to the min/max value for each of the above-mentioned stats.
>> 
>> I’m having trouble figuring out how to actually skip around within the file.
>> 
>> The file that I am experimenting with is the demo-11-none.orc file from the examples directory.
>> 
>> Here are the details of this file:
>> 
>> stt=# SELECT * FROM orc_get_file_info('/work/stt/orc/examples/demo-11-none.orc');
>> ┌─[ RECORD 1 ]───────────┬───────────┐
>> │ file_size              │ 5147970   │
>> │ columns                │ 10        │
>> │ rows                   │ 1920800   │
>> │ stripes                │ 385       │
>> │ version                │ 0.11      │
>> │ writer_id              │ orc_java  │
>> │ compression_kind       │ none      │
>> │ writer_version         │ HIVE-8732 │
>> │ row_index_stride       │ 10000     │
>> │ stripe_stat_count      │ 385       │
>> │ content_length         │ 5069718   │
>> │ strip_stats_length     │ 70748     │
>> │ file_footer_length     │ 7481      │
>> │ file_postscript_length │ 22        │
>> └────────────────────────┴───────────┘
>> 
>> I’m using a RowReader to scan my way through the file - when I call RowReader::createRowBatch(), I specify a batch size of 1000 rows.
>> 
>> I can figure out how to use the stats returned by Reader::getStatistics() - I’ll just skip to the next file of the predicate expression cannot be satisfied according to those stats.
>> 
>> I can read the stats for each stripe and can evaluate the predicate.  But my question is, how do I skip to the next stripe?
>> 
>> Do I just do something like this: 
>> 
>> if (predicateRefutes(stripeStats))
>> {
>>    rowReader->seekToRow(rowReader->getRowNumber() + stripeInfo->getNumerOfRows());
>>    rowReader->next(myColumnVectorBatch);
>> }
>> 
>> In other words, is seekToRow() the proper way to move from stripe to stripe?  
>> 
>> Does the batch size (1000 rows in my case) have anything to do with this?
>> 
>> Thanks.
>>     
>>              — Korry
>> 


Re: Question about using indexes/statistics

Posted by Korry Douglas <ko...@me.com>.
Thanks for the quick answer.  Now it’s just a small matter of programming :-)

            — Korry

> On Mar 4, 2019, at 12:19 PM, ustcwg@gmail.com wrote:
> 
> Hi Korry,
> 
> Yes you are right. You can get stats of file stripe and row group level using those interfaces and then implement your predicate push down logic. seekToRow is the right function to call to skip to stripe or row group. What you need do is to compute the start of stripe or row group based on the offset in file footer & stripe footer for your target stripe or positions in the index streams in the stripe for target row group. 
> 
> The batch size is just a hint to get rows of data. But the reader will try to read the size you want but it will stop at the stripe boundary.
> 
> Best,
> Gang
> 
> Sent from my iPhone
> 
> On Mar 4, 2019, at 08:52, Korry Douglas <korry@me.com <ma...@me.com>> wrote:
> 
>> Hi all, I am trying to implement predicate-pushdown in my PostgreSQL->ORC foreign data wrapper (FDW). 
>> 
>> If I understand correctly, I should be able to:
>> 
>> 1) skip the entire file based on the statistics returned by Reader::getStatistics()
>> 
>> 2) skip individual stripes based on the statistics returned by Reader::getStripeStatistics(stripeNumber)
>> 
>> 3) skip individual row groups based on the stats returned by StripeStatistics::getRowIndexStatistics(columnId, indexId)
>> 
>> Is that correct so far? I understand that will have to apply the predicate expression to the min/max value for each of the above-mentioned stats.
>> 
>> I’m having trouble figuring out how to actually skip around within the file.
>> 
>> The file that I am experimenting with is the demo-11-none.orc file from the examples directory.
>> 
>> Here are the details of this file:
>> 
>> stt=# SELECT * FROM orc_get_file_info('/work/stt/orc/examples/demo-11-none.orc');
>> ┌─[ RECORD 1 ]───────────┬───────────┐
>> │ file_size              │ 5147970   │
>> │ columns                │ 10        │
>> │ rows                   │ 1920800   │
>> │ stripes                │ 385       │
>> │ version                │ 0.11      │
>> │ writer_id              │ orc_java  │
>> │ compression_kind       │ none      │
>> │ writer_version         │ HIVE-8732 │
>> │ row_index_stride       │ 10000     │
>> │ stripe_stat_count      │ 385       │
>> │ content_length         │ 5069718   │
>> │ strip_stats_length     │ 70748     │
>> │ file_footer_length     │ 7481      │
>> │ file_postscript_length │ 22        │
>> └────────────────────────┴───────────┘
>> 
>> I’m using a RowReader to scan my way through the file - when I call RowReader::createRowBatch(), I specify a batch size of 1000 rows.
>> 
>> I can figure out how to use the stats returned by Reader::getStatistics() - I’ll just skip to the next file of the predicate expression cannot be satisfied according to those stats.
>> 
>> I can read the stats for each stripe and can evaluate the predicate.  But my question is, how do I skip to the next stripe?
>> 
>> Do I just do something like this: 
>> 
>> if (predicateRefutes(stripeStats))
>> {
>>    rowReader->seekToRow(rowReader->getRowNumber() + stripeInfo->getNumerOfRows());
>>    rowReader->next(myColumnVectorBatch);
>> }
>> 
>> In other words, is seekToRow() the proper way to move from stripe to stripe?  
>> 
>> Does the batch size (1000 rows in my case) have anything to do with this?
>> 
>> Thanks.
>>     
>>              — Korry
>> 


Re: Question about using indexes/statistics

Posted by us...@gmail.com.
Hi Korry,

Yes you are right. You can get stats of file stripe and row group level using those interfaces and then implement your predicate push down logic. seekToRow is the right function to call to skip to stripe or row group. What you need do is to compute the start of stripe or row group based on the offset in file footer & stripe footer for your target stripe or positions in the index streams in the stripe for target row group. 

The batch size is just a hint to get rows of data. But the reader will try to read the size you want but it will stop at the stripe boundary.

Best,
Gang

Sent from my iPhone

> On Mar 4, 2019, at 08:52, Korry Douglas <ko...@me.com> wrote:
> 
> Hi all, I am trying to implement predicate-pushdown in my PostgreSQL->ORC foreign data wrapper (FDW). 
> 
> If I understand correctly, I should be able to:
> 
> 1) skip the entire file based on the statistics returned by Reader::getStatistics()
> 
> 2) skip individual stripes based on the statistics returned by Reader::getStripeStatistics(stripeNumber)
> 
> 3) skip individual row groups based on the stats returned by StripeStatistics::getRowIndexStatistics(columnId, indexId)
> 
> Is that correct so far? I understand that will have to apply the predicate expression to the min/max value for each of the above-mentioned stats.
> 
> I’m having trouble figuring out how to actually skip around within the file.
> 
> The file that I am experimenting with is the demo-11-none.orc file from the examples directory.
> 
> Here are the details of this file:
> 
> stt=# SELECT * FROM orc_get_file_info('/work/stt/orc/examples/demo-11-none.orc');
> ┌─[ RECORD 1 ]───────────┬───────────┐
> │ file_size              │ 5147970   │
> │ columns                │ 10        │
> │ rows                   │ 1920800   │
> │ stripes                │ 385       │
> │ version                │ 0.11      │
> │ writer_id              │ orc_java  │
> │ compression_kind       │ none      │
> │ writer_version         │ HIVE-8732 │
> │ row_index_stride       │ 10000     │
> │ stripe_stat_count      │ 385       │
> │ content_length         │ 5069718   │
> │ strip_stats_length     │ 70748     │
> │ file_footer_length     │ 7481      │
> │ file_postscript_length │ 22        │
> └────────────────────────┴───────────┘
> 
> I’m using a RowReader to scan my way through the file - when I call RowReader::createRowBatch(), I specify a batch size of 1000 rows.
> 
> I can figure out how to use the stats returned by Reader::getStatistics() - I’ll just skip to the next file of the predicate expression cannot be satisfied according to those stats.
> 
> I can read the stats for each stripe and can evaluate the predicate.  But my question is, how do I skip to the next stripe?
> 
> Do I just do something like this: 
> 
> if (predicateRefutes(stripeStats))
> {
>    rowReader->seekToRow(rowReader->getRowNumber() + stripeInfo->getNumerOfRows());
>    rowReader->next(myColumnVectorBatch);
> }
> 
> In other words, is seekToRow() the proper way to move from stripe to stripe?  
> 
> Does the batch size (1000 rows in my case) have anything to do with this?
> 
> Thanks.
>     
>              — Korry
>