You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Aaron Kimball (JIRA)" <ji...@apache.org> on 2010/03/12 05:01:27 UTC

[jira] Commented: (AVRO-459) Allow lazy reading of large fields from data files

    [ https://issues.apache.org/jira/browse/AVRO-459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844349#action_12844349 ] 

Aaron Kimball commented on AVRO-459:
------------------------------------

I have a use case for creating files where individual fields are very large (possibly hundreds of MB). I would like to be able to store these records in Avro files (The large fields in question are just byte arrays; a record contains this field and possibly an identifier of some sort).

The actual byte array itself may be too big to materialize in RAM. It would be good to have a "lazy" reader which can seek to an arbitrary record boundary, and then return an InputStream (or Reader for character-based arrays) and allow me to use this to pull more contents of the field in as I need to process them. It would be even better if the returned stream is able to seek past uninteresting parts of the byte array to the end.

Using the file reader's ability to iterate over records in the file should just seek past these fields rather than scanning their entire contents (even if I make use of other fields of the same record).

> Allow lazy reading of large fields from data files
> --------------------------------------------------
>
>                 Key: AVRO-459
>                 URL: https://issues.apache.org/jira/browse/AVRO-459
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Aaron Kimball
>
> The current file reader will attempt to materialize individual fields entirely in RAM. If a record is too big to fit in RAM, it would be good to get a stream-based API to very large fields.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.