You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Anna Lahoud <an...@gmail.com> on 2013/08/16 21:23:57 UTC

Is there a way to conditionally read Avro data?

I am wondering if there is a way that I can avoid reading all of an item in
an Avro file, based on some of the data that I have already read. For
instance, say I have a datum where I know that if it's 'type' value is a
'ComputerVirus', and that I do not want to touch the remaining fields. Is
there a way to 'move on' and get the next datum, without touching the
remainder of the scary datum? I would call it a 'conditional read' in that
I only want to fully read the datum if the datum meets some criteria.

Anna

Re: Is there a way to conditionally read Avro data?

Posted by Harsh J <ha...@cloudera.com>.
What Eric suggests (reader schemas) would work, but may incur a double
read cost when you wish to proceed based on a positive condition met
by the specific read.

If this data is held, order-wise, early into the record, then perhaps
using a custom DatumReader implementation (that does the low level
deserialization) may work more effectively. You can pass a DatumReader
when constructing a DataFileReader - but its quite a long route to go
IMO.

On Sat, Aug 17, 2013 at 4:17 AM, Eric Wasserman <ew...@247-inc.com> wrote:
> If you define you records like this (this is in the Avro IDL lang. for
> brevity)
>
> If you write your records with a schema like this:
>
>
> record R {
>
>     Header header;
>
>     Body body;
>
>   }
>
>
>
> Then you can read with a schema like this:
>
>
>   record RSansBody {
>
>     Header header;
>
>   }
>
>
> And the Avro libraries will read the header part (in which your "type" would
> reside) and effectively skip the body part.
>
> ________________________________
> From: Anna Lahoud <an...@gmail.com>
> Sent: Friday, August 16, 2013 12:23 PM
> To: user@avro.apache.org
> Subject: Is there a way to conditionally read Avro data?
>
> I am wondering if there is a way that I can avoid reading all of an item in
> an Avro file, based on some of the data that I have already read. For
> instance, say I have a datum where I know that if it's 'type' value is a
> 'ComputerVirus', and that I do not want to touch the remaining fields. Is
> there a way to 'move on' and get the next datum, without touching the
> remainder of the scary datum? I would call it a 'conditional read' in that I
> only want to fully read the datum if the datum meets some criteria.
>
> Anna
>



-- 
Harsh J

RE: Is there a way to conditionally read Avro data?

Posted by Eric Wasserman <ew...@247-inc.com>.
If you define you records like this (this is in the Avro IDL lang. for brevity)

If you write your records with a schema like this:


record R {

    Header header;

    Body body;

  }



Then you can read with a schema like this:


  record RSansBody {

    Header header;

  }

And the Avro libraries will read the header part (in which your "type" would reside) and effectively skip the body part.

________________________________
From: Anna Lahoud <an...@gmail.com>
Sent: Friday, August 16, 2013 12:23 PM
To: user@avro.apache.org
Subject: Is there a way to conditionally read Avro data?

I am wondering if there is a way that I can avoid reading all of an item in an Avro file, based on some of the data that I have already read. For instance, say I have a datum where I know that if it's 'type' value is a 'ComputerVirus', and that I do not want to touch the remaining fields. Is there a way to 'move on' and get the next datum, without touching the remainder of the scary datum? I would call it a 'conditional read' in that I only want to fully read the datum if the datum meets some criteria.

Anna