You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by cu...@apache.org on 2010/03/25 17:36:13 UTC

Re: My mainly idea about implement data communicate tool between json/xml/csv and avro data files

> On Thu, Mar 25, 2010 at 12:52 AM, Scott Carey <sc...@richrelevance.com>wrote:
>>  I'm not sure it makes sense to map Avro data into CSV.

I agree that mapping arbitrary Avro data into CSV is difficult.  But, 
for some cases it might be sensible, for example, when the top-level 
schema is a record whose fields are primitive types.  In general, one 
could simply flatten the schema to primitive types, and escape values 
which contain commas.  This will not work well with recursive schemas or 
unions, and one can only restore such a format to Avro if one has the 
identical schema, but I think these might be acceptable, necessary 
limitations.  Errors can be generated if these conditions are not met.

Peng Cui wrote:
> But i think, why we do not generate a schema for
> each CSV data file?

Yes, I think such an approach could be practical and useful.

We should consider uses cases.  One use case is exporting Avro data to 
tools that accept CSV, e.g., a spreadsheet.  A spreadsheet will never 
represent the full structure of Avro data, but, when possible, it might 
still be useful to be able to export Avro data to a spreadsheet.

Doug

Re: My mainly idea about implement data communicate tool between json/xml/csv and avro data files

Posted by Peng Cui <aj...@gmail.com>.
Hi Doug,
It seems that we reach a consensus  that generate a special schema for CSV
data is a useful approach, but i think this can only handler some special
AVRO data.

If the AVRO data file contain recursive or unions data, i think it is hard
or even impossible to change this AVRO data to CSV data file.As you said,
"these might be acceptable, necessary limitations".

But storing CSV data to AVRO data is always feasible, so, we reach a
conclusion:
CSV to AVRO , ok
AVRO to CSV , partial

Peng

On Fri, Mar 26, 2010 at 12:36 AM, <cu...@apache.org> wrote:

>  On Thu, Mar 25, 2010 at 12:52 AM, Scott Carey <scott@richrelevance.com
>> >wrote:
>>
>>>  I'm not sure it makes sense to map Avro data into CSV.
>>>
>>
> I agree that mapping arbitrary Avro data into CSV is difficult.  But, for
> some cases it might be sensible, for example, when the top-level schema is a
> record whose fields are primitive types.  In general, one could simply
> flatten the schema to primitive types, and escape values which contain
> commas.  This will not work well with recursive schemas or unions, and one
> can only restore such a format to Avro if one has the identical schema, but
> I think these might be acceptable, necessary limitations.  Errors can be
> generated if these conditions are not met.
>
>
> Peng Cui wrote:
>
>> But i think, why we do not generate a schema for
>> each CSV data file?
>>
>
> Yes, I think such an approach could be practical and useful.
>
> We should consider uses cases.  One use case is exporting Avro data to
> tools that accept CSV, e.g., a spreadsheet.  A spreadsheet will never
> represent the full structure of Avro data, but, when possible, it might
> still be useful to be able to export Avro data to a spreadsheet.
>
> Doug
>