You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Harsh J Chouraria (JIRA)" <ji...@apache.org> on 2010/08/19 21:53:18 UTC

[jira] Commented: (AVRO-458) add tools that read/write CSV records from/to avro data files

    [ https://issues.apache.org/jira/browse/AVRO-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900415#action_12900415 ] 

Harsh J Chouraria commented on AVRO-458:
----------------------------------------

I've put together a simple cli tool with Python that does the following (with some tunable opts):

CSV to Avro ->
  1. Pass a schema file or it generates one based on CSV header with all string types.
  2. Read/Split each CSV record (from a list of input files) with given delimiter (default ',') and convert their data to their valid schema types.
  p.s. In case of an exception during data-type-mappings (like say null in place of what's supposed to be a float in CSV), check if there's a default field in the schema passed and use it. Else throw an informative exception. I know this makes the 'default' meaning of the schema look wrong, but its a great feature to have!
  3. Write these records down into a data file.

Avro to CSV ->
 1. Pass a schema to read selective data. Else it reads the file with full schema.
 2. Read each record [only works with records for now] and convert all data to string type. Can read from many avro files into a csv file.
 3. Write to a csv file with an optional header.

Currently the code (WIP) resides on GitHub at: http://github.com/QwertyManiac/avroutils but I'll submit the stuff as a formal patch once it feels complete.

This comment is for gaining some suggestions. What to extend/etc.

> add tools that read/write CSV records from/to avro data files
> -------------------------------------------------------------
>
>                 Key: AVRO-458
>                 URL: https://issues.apache.org/jira/browse/AVRO-458
>             Project: Avro
>          Issue Type: New Feature
>            Reporter: Doug Cutting
>
> It might be useful to have command-line tools that can read & write arbitrary CSV data from & to Avro data files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.