You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by Peng Cui <aj...@gmail.com> on 2010/03/22 06:31:09 UTC

My mainly idea about implement data communicate tool between json/xml/csv and avro data files

Hi Doug,

My name is Cui Peng. I want to implement the data communicate tool between
json/xml/csv and avro data files as you described in the GSoC 2010 idea
list. I exported AVRO source code,research its design and architect,then i
got mainly idea about the tool, then i will show it to you,and expecting
your advises :-)

I think there are mainly two parts of jobs to do:

1. Read/write json/xml/csv records from/to avro data files
There are two steps:

Step one: read/write json/xml/csv records to AVRO datum

For json:
AVRO supplies ParsingDecoder and JsonGenerator already,we can use these two
classes to communicate data between AVRO datum and json data.
For XML:
I must extends the abstract ParsingDecoder,and build XMLDecoder  class to
parse data from XML file,and convert it to AVRO datum. And also,a
XMLGenerator class which is used to change AVRO datum to XML data file is
also necessary. This section need some XML parse jobs,may be Apache Xerces
is a good choice, fortunately, i am familiar with it.
For CSV:
Also,i must build a CSVDecoder to convert CSV data to AVRO datum and a
CSVGenerator class to convert AVRO datum to CSV files. This section need
some operations with CSV data,I think Apache Commons csv can help us.

Step two: read/write AVRO datum to avro data files
AVRO has implemented this function already,so, i will not cost me much time
and energy

2. A Swing based command-line tool,this tool will help us to execute some
commands, collect data from user input etc.
Step one give us data communicate support between json/xml/csv data files
and avro data files,then,we should build the command-line tool and design
its command system.

1).this tool will have three mode,json,xml or csv model,can use special
command to  swith working model
2).this tool will support two data input model,from keyborad or from exist
data file
3).its command adopts command and argument form,for example,"input -f" means
import data from existing data files,"input -k" means give user a graphics
data input area,user can input data though keyboard
4).data output format function
5).if  exception occurs, it will show in the tool


That is all,if you have any ideas,please let me know. Thank you and best
regards

Re: My mainly idea about implement data communicate tool between json/xml/csv and avro data files

Posted by Peng Cui <aj...@gmail.com>.
Hi all,

These days, i was doing some coding job for the tool. I think, it supports
json,xml and csv, why do we make it support more ? Such as read/write Excel
Sheet,Web Service or even Database ? I think implementing data communicate
method between AVRO and Database is very practical, may be, we can build an
Database Backup tool for all the database management system base on this
tool, isn't it cool ?

Re: My mainly idea about implement data communicate tool between json/xml/csv and avro data files

Posted by Peng Cui <aj...@gmail.com>.
Hi Doug,
It seems that we reach a consensus  that generate a special schema for CSV
data is a useful approach, but i think this can only handler some special
AVRO data.

If the AVRO data file contain recursive or unions data, i think it is hard
or even impossible to change this AVRO data to CSV data file.As you said,
"these might be acceptable, necessary limitations".

But storing CSV data to AVRO data is always feasible, so, we reach a
conclusion:
CSV to AVRO , ok
AVRO to CSV , partial

Peng

On Fri, Mar 26, 2010 at 12:36 AM, <cu...@apache.org> wrote:

>  On Thu, Mar 25, 2010 at 12:52 AM, Scott Carey <scott@richrelevance.com
>> >wrote:
>>
>>>  I'm not sure it makes sense to map Avro data into CSV.
>>>
>>
> I agree that mapping arbitrary Avro data into CSV is difficult.  But, for
> some cases it might be sensible, for example, when the top-level schema is a
> record whose fields are primitive types.  In general, one could simply
> flatten the schema to primitive types, and escape values which contain
> commas.  This will not work well with recursive schemas or unions, and one
> can only restore such a format to Avro if one has the identical schema, but
> I think these might be acceptable, necessary limitations.  Errors can be
> generated if these conditions are not met.
>
>
> Peng Cui wrote:
>
>> But i think, why we do not generate a schema for
>> each CSV data file?
>>
>
> Yes, I think such an approach could be practical and useful.
>
> We should consider uses cases.  One use case is exporting Avro data to
> tools that accept CSV, e.g., a spreadsheet.  A spreadsheet will never
> represent the full structure of Avro data, but, when possible, it might
> still be useful to be able to export Avro data to a spreadsheet.
>
> Doug
>

Re: My mainly idea about implement data communicate tool between json/xml/csv and avro data files

Posted by cu...@apache.org.
> On Thu, Mar 25, 2010 at 12:52 AM, Scott Carey <sc...@richrelevance.com>wrote:
>>  I'm not sure it makes sense to map Avro data into CSV.

I agree that mapping arbitrary Avro data into CSV is difficult.  But, 
for some cases it might be sensible, for example, when the top-level 
schema is a record whose fields are primitive types.  In general, one 
could simply flatten the schema to primitive types, and escape values 
which contain commas.  This will not work well with recursive schemas or 
unions, and one can only restore such a format to Avro if one has the 
identical schema, but I think these might be acceptable, necessary 
limitations.  Errors can be generated if these conditions are not met.

Peng Cui wrote:
> But i think, why we do not generate a schema for
> each CSV data file?

Yes, I think such an approach could be practical and useful.

We should consider uses cases.  One use case is exporting Avro data to 
tools that accept CSV, e.g., a spreadsheet.  A spreadsheet will never 
represent the full structure of Avro data, but, when possible, it might 
still be useful to be able to export Avro data to a spreadsheet.

Doug

Re: My mainly idea about implement data communicate tool between json/xml/csv and avro data files

Posted by Peng Cui <aj...@gmail.com>.
Hi Scott,

Thank you for your reply, please see my commends below:

On Thu, Mar 25, 2010 at 12:52 AM, Scott Carey <sc...@richrelevance.com>wrote:

> For CSV there is a disconnect on data types.   The "ordinary" CSV is
> typically quoted per field, and requires escaping of data to ensure that
> delimiters and quote characters don't exist in the data without escapes.
> Furthermore, it does not handle Arrays, Maps, Unions, or nested objects.
>  I'm not sure it makes sense to map Avro data into CSV.
>
Yeah,CSV data structure is simple, it is made up of rows of datas which is
splited by comma(,), if one column data contains comma in its data content,
for example,"data,withcomma",CSV will store this column with quote. And if
one data column contains quote in its data,for example,"go"od",CSV will
change one quote to two quotes and add a quote both at the beginning and at
the end of the data,so "go"od" will change to """go""od""".

So,if we want to store CSV data in to AVRO data file, I think we need some
pre-handler operations, such as escape quote characters ect.


> For example an avro schema can be a linked list, or even a binary tree with
> a structure at each node -- CSV typically has a fixed set of fields per row.
>  Extending it to be capable of nested records, lists, unions, and maps would
> make it incompatible with generic CSV reader/writers.  However, going the
> other way, and mapping a CSV into a subset of Avro would work -- Avro could
> read CSV as a fixed simple record, but it could not write an arbitrary
> record as CSV.
>
Yeah,CSV data has no schema,so,may be if we store it into AVRO data file,it
is not easy to restore it. But i think, why we do not generate a schema for
each CSV data file? We can handler CSV as special JSON or XML data file. For
example,we have following CSV data:
one,two,three
four,five,six
This CSV data file has two row which is made up of three columns, we can
treat is as JSON data:
[
{"c1":"one","c2":"two","c3":"three"},
{"c1":"four","c2":"five","c3":"six"}
]
or handler it as XML data:
<rows>
<row><c1>one</c1><c2>two</c2><c3>three</c3></row>
<row><c4>three</c4><c5>four</c5><c6>six</c6></row>
</rows>
Then,we should generate the schema file for the JSON data or for the XML
data, i think it is
feasible, and not very difficult

>
> XML, is sufficiently rich to create an Avro serialization scheme for, and
> one could probably define a XML DTD for an avro compatible serialization
> format.
>
Yeah,one could define a XML DTD for avro compatible serialization format,but
i think we should handler XML schema file too,as i know, now XML schema is
more and more popular :-)

>
> Currently the Avro data files only store the data internally as binary,
> since there has been no need to store data in a larger and less efficient
> format.  However, reading the binary file as JSON has been built as a
> command-line tool for debugging purposes.
>
The command-line in your words are the classes
org.apache.avro.tool.BinaryFragmentToJsonTool and
org.apache.avro.tool.JsonToBinaryFragmentTool ? Thank you

>
> On Mar 24, 2010, at 1:33 AM, Peng Cui wrote:
>
> > Hi all,
> >
> > I almost finished my GSoC proposal about the project " a data communicate
> > tool between json/xml/csv and avro data files".I will describe it for you
> > and expecting your advises.
> >
> > Two mainly parts of the tool:
> >
> > 1. Data communication module,i.e. read/write json/xml/csv records from/to
> > avro data files
> > There are two steps:
> >
> > Step one: read/write json/xml/csv records to AVRO datum
> >
> > For json:
> > AVRO supplies ParsingDecoder and JsonGenerator already,we can use these
> two
> > classes to communicate data between AVRO datum and json data.
> >
> > For XML:
> > I must extends the abstract ParsingDecoder,and build XMLDecoder  class to
> > parse data from XML file,and convert it to AVRO datum. And also,a
> > XMLGenerator class which is used to change AVRO datum to XML data file is
> > also necessary. This section need some XML parse jobs,may be Apache
> Xerces
> > is a good choice, fortunately, i am familiar with it.
> >
> > For CSV:
> > Also,i must build a CSVDecoder to convert CSV data to AVRO datum and a
> > CSVGenerator class to convert AVRO datum to CSV files. This section need
> > some operations with CSV data,I think Apache Commons csv can help us.
> >
> > Step two: read/write AVRO datum to avro data files
> >
> > AVRO has implemented this function already,so, it will not cost me much
> time
> > and energy
> >
> > 2.command-tool interface design
> >
> > Basic interface design:
> >
> > The tool is based on Java Swing,it is made up of a command input textarea
> > and a information output panel which is used to show now status,command
> > execute result and data output ect.
> >
> > Command system design:
> > 1).Each command is a class which implement a interface called
> > BasicCommand,the interface has a execute function. Command implemention
> > class must implement the concrete operations in the execute function.
> > 2).Use a xml configuration file to register command classes in to the
> > command system. At the beginning,this tool will have some basic
> commands(i
> > will introduce the basic commands soon after),in the future,if we want to
> > implement more commands for the tool, finish the corresponding command
> > class,then register it,ok!
> > 3).In the initialization period,the tool will parse command configuration
> > xml file,instance
> > command classes,and load them in the context. It will use a ArrayList to
> > store all the
> > system commands during running period.
> > 4).when user input a command,the tool traversal command array list,if the
> > command exist and have correct format argument,execute it (execution
> > operation is to invoke command instance's execute function). If the
> command
> > exist,but the arguments is not match with
> > declaration,print out usage information about the command.If the tool can
> > not find the
> > command,tell user "the command is not an available command".
> > 5).The tool use a xml configuration file to store some system
> > attributes,such as default
> > workspace,default work mode(json/xml or csv) and info output fonsize ect.
> >
> > System initialization commands design:
> > 1).workspace set up command;
> > 2).get history workspace command;
> > 3).work mode change command;
> > 4).list data files command;
> > 5).data output command;
> > This command works different in different work mode,for example,in json
> > mode,the data will
> > output as a json string,but in xml mode,the data will output as a xml
> file.
> > User can also assign specific output mode by argument,default output mode
> is
> > current working mode.
> > This command can assign specific output stream,export the data into a
> data
> > file or just
> > output in the tool interface,default output stream is the operation
> > interface.
> > 6).data input command:
> > This command is used to input data and change it to AVRO data file. It
> has
> > four work
> > mode,user can assign its work model by command argument:
> >
> > model 1:input schema data and content data from IO device;
> > model 2:input schema data from IO devices but input content data from
> data
> > file in the local
> > disk;
> > model 3:input schema data from data file in the local disk but input
> content
> > data from IO
> > devices;
> > model 4:input schema data and content data from data files in the local
> > disk.
> > Default work mode is mode 1,when user input this command,press enter,a
> > Graphic Swing Panel show up,user can finish its input job in this panel.
> Of
> > course,different command mode will bring different Swing Input Panel,four
> in
> > all.
> > 7).system basic set up command,this may include set up
> font,fontsize,color
> > ect.
> >
> > This is my mainly ideas,any one have advises or suggestions,please let me
> > know,thank you :-)
> >
> > Peng
> > On Mon, Mar 22, 2010 at 1:31 PM, Peng Cui <aj...@gmail.com> wrote:
> >
> >> Hi Doug,
> >>
> >> My name is Cui Peng. I want to implement the data communicate tool
> between
> >> json/xml/csv and avro data files as you described in the GSoC 2010 idea
> >> list. I exported AVRO source code,research its design and architect,then
> i
> >> got mainly idea about the tool, then i will show it to you,and expecting
> >> your advises :-)
> >>
> >> I think there are mainly two parts of jobs to do:
> >>
> >> 1. Read/write json/xml/csv records from/to avro data files
> >> There are two steps:
> >>
> >> Step one: read/write json/xml/csv records to AVRO datum
> >>
> >> For json:
> >> AVRO supplies ParsingDecoder and JsonGenerator already,we can use these
> two
> >> classes to communicate data between AVRO datum and json data.
> >> For XML:
> >> I must extends the abstract ParsingDecoder,and build XMLDecoder  class
> to
> >> parse data from XML file,and convert it to AVRO datum. And also,a
> >> XMLGenerator class which is used to change AVRO datum to XML data file
> is
> >> also necessary. This section need some XML parse jobs,may be Apache
> Xerces
> >> is a good choice, fortunately, i am familiar with it.
> >> For CSV:
> >> Also,i must build a CSVDecoder to convert CSV data to AVRO datum and a
> >> CSVGenerator class to convert AVRO datum to CSV files. This section need
> >> some operations with CSV data,I think Apache Commons csv can help us.
> >>
> >> Step two: read/write AVRO datum to avro data files
> >> AVRO has implemented this function already,so, i will not cost me much
> time
> >> and energy
> >>
> >> 2. A Swing based command-line tool,this tool will help us to execute
> some
> >> commands, collect data from user input etc.
> >> Step one give us data communicate support between json/xml/csv data
> files
> >> and avro data files,then,we should build the command-line tool and
> design
> >> its command system.
> >>
> >> 1).this tool will have three mode,json,xml or csv model,can use special
> >> command to  swith working model
> >> 2).this tool will support two data input model,from keyborad or from
> exist
> >> data file
> >> 3).its command adopts command and argument form,for example,"input -f"
> >> means import data from existing data files,"input -k" means give user
> >> a graphics data input area,user can input data though keyboard
> >> 4).data output format function
> >> 5).if  exception occurs, it will show in the tool
> >>
> >>
> >> That is all,if you have any ideas,please let me know. Thank you and best
> >> regards
> >>
>
>

Re: My mainly idea about implement data communicate tool between json/xml/csv and avro data files

Posted by Scott Carey <sc...@richrelevance.com>.
For CSV there is a disconnect on data types.   The "ordinary" CSV is typically quoted per field, and requires escaping of data to ensure that delimiters and quote characters don't exist in the data without escapes.   Furthermore, it does not handle Arrays, Maps, Unions, or nested objects.  I'm not sure it makes sense to map Avro data into CSV.

For example an avro schema can be a linked list, or even a binary tree with a structure at each node -- CSV typically has a fixed set of fields per row.  Extending it to be capable of nested records, lists, unions, and maps would make it incompatible with generic CSV reader/writers.  However, going the other way, and mapping a CSV into a subset of Avro would work -- Avro could read CSV as a fixed simple record, but it could not write an arbitrary record as CSV.

XML, is sufficiently rich to create an Avro serialization scheme for, and one could probably define a XML DTD for an avro compatible serialization format.

Currently the Avro data files only store the data internally as binary, since there has been no need to store data in a larger and less efficient format.  However, reading the binary file as JSON has been built as a command-line tool for debugging purposes.

On Mar 24, 2010, at 1:33 AM, Peng Cui wrote:

> Hi all,
> 
> I almost finished my GSoC proposal about the project " a data communicate
> tool between json/xml/csv and avro data files".I will describe it for you
> and expecting your advises.
> 
> Two mainly parts of the tool:
> 
> 1. Data communication module,i.e. read/write json/xml/csv records from/to
> avro data files
> There are two steps:
> 
> Step one: read/write json/xml/csv records to AVRO datum
> 
> For json:
> AVRO supplies ParsingDecoder and JsonGenerator already,we can use these two
> classes to communicate data between AVRO datum and json data.
> 
> For XML:
> I must extends the abstract ParsingDecoder,and build XMLDecoder  class to
> parse data from XML file,and convert it to AVRO datum. And also,a
> XMLGenerator class which is used to change AVRO datum to XML data file is
> also necessary. This section need some XML parse jobs,may be Apache Xerces
> is a good choice, fortunately, i am familiar with it.
> 
> For CSV:
> Also,i must build a CSVDecoder to convert CSV data to AVRO datum and a
> CSVGenerator class to convert AVRO datum to CSV files. This section need
> some operations with CSV data,I think Apache Commons csv can help us.
> 
> Step two: read/write AVRO datum to avro data files
> 
> AVRO has implemented this function already,so, it will not cost me much time
> and energy
> 
> 2.command-tool interface design
> 
> Basic interface design:
> 
> The tool is based on Java Swing,it is made up of a command input textarea
> and a information output panel which is used to show now status,command
> execute result and data output ect.
> 
> Command system design:
> 1).Each command is a class which implement a interface called
> BasicCommand,the interface has a execute function. Command implemention
> class must implement the concrete operations in the execute function.
> 2).Use a xml configuration file to register command classes in to the
> command system. At the beginning,this tool will have some basic commands(i
> will introduce the basic commands soon after),in the future,if we want to
> implement more commands for the tool, finish the corresponding command
> class,then register it,ok!
> 3).In the initialization period,the tool will parse command configuration
> xml file,instance
> command classes,and load them in the context. It will use a ArrayList to
> store all the
> system commands during running period.
> 4).when user input a command,the tool traversal command array list,if the
> command exist and have correct format argument,execute it (execution
> operation is to invoke command instance's execute function). If the command
> exist,but the arguments is not match with
> declaration,print out usage information about the command.If the tool can
> not find the
> command,tell user "the command is not an available command".
> 5).The tool use a xml configuration file to store some system
> attributes,such as default
> workspace,default work mode(json/xml or csv) and info output fonsize ect.
> 
> System initialization commands design:
> 1).workspace set up command;
> 2).get history workspace command;
> 3).work mode change command;
> 4).list data files command;
> 5).data output command;
> This command works different in different work mode,for example,in json
> mode,the data will
> output as a json string,but in xml mode,the data will output as a xml file.
> User can also assign specific output mode by argument,default output mode is
> current working mode.
> This command can assign specific output stream,export the data into a data
> file or just
> output in the tool interface,default output stream is the operation
> interface.
> 6).data input command:
> This command is used to input data and change it to AVRO data file. It has
> four work
> mode,user can assign its work model by command argument:
> 
> model 1:input schema data and content data from IO device;
> model 2:input schema data from IO devices but input content data from data
> file in the local
> disk;
> model 3:input schema data from data file in the local disk but input content
> data from IO
> devices;
> model 4:input schema data and content data from data files in the local
> disk.
> Default work mode is mode 1,when user input this command,press enter,a
> Graphic Swing Panel show up,user can finish its input job in this panel. Of
> course,different command mode will bring different Swing Input Panel,four in
> all.
> 7).system basic set up command,this may include set up font,fontsize,color
> ect.
> 
> This is my mainly ideas,any one have advises or suggestions,please let me
> know,thank you :-)
> 
> Peng
> On Mon, Mar 22, 2010 at 1:31 PM, Peng Cui <aj...@gmail.com> wrote:
> 
>> Hi Doug,
>> 
>> My name is Cui Peng. I want to implement the data communicate tool between
>> json/xml/csv and avro data files as you described in the GSoC 2010 idea
>> list. I exported AVRO source code,research its design and architect,then i
>> got mainly idea about the tool, then i will show it to you,and expecting
>> your advises :-)
>> 
>> I think there are mainly two parts of jobs to do:
>> 
>> 1. Read/write json/xml/csv records from/to avro data files
>> There are two steps:
>> 
>> Step one: read/write json/xml/csv records to AVRO datum
>> 
>> For json:
>> AVRO supplies ParsingDecoder and JsonGenerator already,we can use these two
>> classes to communicate data between AVRO datum and json data.
>> For XML:
>> I must extends the abstract ParsingDecoder,and build XMLDecoder  class to
>> parse data from XML file,and convert it to AVRO datum. And also,a
>> XMLGenerator class which is used to change AVRO datum to XML data file is
>> also necessary. This section need some XML parse jobs,may be Apache Xerces
>> is a good choice, fortunately, i am familiar with it.
>> For CSV:
>> Also,i must build a CSVDecoder to convert CSV data to AVRO datum and a
>> CSVGenerator class to convert AVRO datum to CSV files. This section need
>> some operations with CSV data,I think Apache Commons csv can help us.
>> 
>> Step two: read/write AVRO datum to avro data files
>> AVRO has implemented this function already,so, i will not cost me much time
>> and energy
>> 
>> 2. A Swing based command-line tool,this tool will help us to execute some
>> commands, collect data from user input etc.
>> Step one give us data communicate support between json/xml/csv data files
>> and avro data files,then,we should build the command-line tool and design
>> its command system.
>> 
>> 1).this tool will have three mode,json,xml or csv model,can use special
>> command to  swith working model
>> 2).this tool will support two data input model,from keyborad or from exist
>> data file
>> 3).its command adopts command and argument form,for example,"input -f"
>> means import data from existing data files,"input -k" means give user
>> a graphics data input area,user can input data though keyboard
>> 4).data output format function
>> 5).if  exception occurs, it will show in the tool
>> 
>> 
>> That is all,if you have any ideas,please let me know. Thank you and best
>> regards
>> 


Re: My mainly idea about implement data communicate tool between json/xml/csv and avro data files

Posted by Peng Cui <aj...@gmail.com>.
Hi all,

I almost finished my GSoC proposal about the project " a data communicate
tool between json/xml/csv and avro data files".I will describe it for you
and expecting your advises.

Two mainly parts of the tool:

1. Data communication module,i.e. read/write json/xml/csv records from/to
avro data files
There are two steps:

Step one: read/write json/xml/csv records to AVRO datum

For json:
AVRO supplies ParsingDecoder and JsonGenerator already,we can use these two
classes to communicate data between AVRO datum and json data.

For XML:
I must extends the abstract ParsingDecoder,and build XMLDecoder  class to
parse data from XML file,and convert it to AVRO datum. And also,a
XMLGenerator class which is used to change AVRO datum to XML data file is
also necessary. This section need some XML parse jobs,may be Apache Xerces
is a good choice, fortunately, i am familiar with it.

For CSV:
Also,i must build a CSVDecoder to convert CSV data to AVRO datum and a
CSVGenerator class to convert AVRO datum to CSV files. This section need
some operations with CSV data,I think Apache Commons csv can help us.

Step two: read/write AVRO datum to avro data files

AVRO has implemented this function already,so, it will not cost me much time
and energy

2.command-tool interface design

Basic interface design:

The tool is based on Java Swing,it is made up of a command input textarea
and a information output panel which is used to show now status,command
execute result and data output ect.

Command system design:
1).Each command is a class which implement a interface called
BasicCommand,the interface has a execute function. Command implemention
class must implement the concrete operations in the execute function.
2).Use a xml configuration file to register command classes in to the
command system. At the beginning,this tool will have some basic commands(i
will introduce the basic commands soon after),in the future,if we want to
implement more commands for the tool, finish the corresponding command
class,then register it,ok!
3).In the initialization period,the tool will parse command configuration
xml file,instance
command classes,and load them in the context. It will use a ArrayList to
store all the
system commands during running period.
4).when user input a command,the tool traversal command array list,if the
command exist and have correct format argument,execute it (execution
operation is to invoke command instance's execute function). If the command
exist,but the arguments is not match with
declaration,print out usage information about the command.If the tool can
not find the
command,tell user "the command is not an available command".
5).The tool use a xml configuration file to store some system
attributes,such as default
workspace,default work mode(json/xml or csv) and info output fonsize ect.

System initialization commands design:
1).workspace set up command;
2).get history workspace command;
3).work mode change command;
4).list data files command;
5).data output command;
This command works different in different work mode,for example,in json
mode,the data will
output as a json string,but in xml mode,the data will output as a xml file.
User can also assign specific output mode by argument,default output mode is
current working mode.
This command can assign specific output stream,export the data into a data
file or just
output in the tool interface,default output stream is the operation
interface.
6).data input command:
This command is used to input data and change it to AVRO data file. It has
four work
mode,user can assign its work model by command argument:

model 1:input schema data and content data from IO device;
model 2:input schema data from IO devices but input content data from data
file in the local
disk;
model 3:input schema data from data file in the local disk but input content
data from IO
devices;
model 4:input schema data and content data from data files in the local
disk.
Default work mode is mode 1,when user input this command,press enter,a
Graphic Swing Panel show up,user can finish its input job in this panel. Of
course,different command mode will bring different Swing Input Panel,four in
all.
7).system basic set up command,this may include set up font,fontsize,color
ect.

This is my mainly ideas,any one have advises or suggestions,please let me
know,thank you :-)

Peng
On Mon, Mar 22, 2010 at 1:31 PM, Peng Cui <aj...@gmail.com> wrote:

> Hi Doug,
>
> My name is Cui Peng. I want to implement the data communicate tool between
> json/xml/csv and avro data files as you described in the GSoC 2010 idea
> list. I exported AVRO source code,research its design and architect,then i
> got mainly idea about the tool, then i will show it to you,and expecting
> your advises :-)
>
> I think there are mainly two parts of jobs to do:
>
> 1. Read/write json/xml/csv records from/to avro data files
> There are two steps:
>
> Step one: read/write json/xml/csv records to AVRO datum
>
> For json:
> AVRO supplies ParsingDecoder and JsonGenerator already,we can use these two
> classes to communicate data between AVRO datum and json data.
> For XML:
> I must extends the abstract ParsingDecoder,and build XMLDecoder  class to
> parse data from XML file,and convert it to AVRO datum. And also,a
> XMLGenerator class which is used to change AVRO datum to XML data file is
> also necessary. This section need some XML parse jobs,may be Apache Xerces
> is a good choice, fortunately, i am familiar with it.
> For CSV:
> Also,i must build a CSVDecoder to convert CSV data to AVRO datum and a
> CSVGenerator class to convert AVRO datum to CSV files. This section need
> some operations with CSV data,I think Apache Commons csv can help us.
>
> Step two: read/write AVRO datum to avro data files
> AVRO has implemented this function already,so, i will not cost me much time
> and energy
>
> 2. A Swing based command-line tool,this tool will help us to execute some
> commands, collect data from user input etc.
> Step one give us data communicate support between json/xml/csv data files
> and avro data files,then,we should build the command-line tool and design
> its command system.
>
> 1).this tool will have three mode,json,xml or csv model,can use special
> command to  swith working model
> 2).this tool will support two data input model,from keyborad or from exist
> data file
> 3).its command adopts command and argument form,for example,"input -f"
> means import data from existing data files,"input -k" means give user
> a graphics data input area,user can input data though keyboard
> 4).data output format function
> 5).if  exception occurs, it will show in the tool
>
>
> That is all,if you have any ideas,please let me know. Thank you and best
> regards
>