You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Philip Zeyliger (JIRA)" <ji...@apache.org> on 2009/12/01 05:50:20 UTC

[jira] Created: (AVRO-245) Commandline utility for converting to and from Avro's binary format.

Commandline utility for converting to and from Avro's binary format.
--------------------------------------------------------------------

                 Key: AVRO-245
                 URL: https://issues.apache.org/jira/browse/AVRO-245
             Project: Avro
          Issue Type: New Feature
          Components: java
            Reporter: Philip Zeyliger
            Assignee: Philip Zeyliger
            Priority: Minor


A utility for avrotool that can convert between Avro binary data and the JSON textual form.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-245) Commandline utility for converting to and from Avro's binary format.

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784531#action_12784531 ] 

Doug Cutting commented on AVRO-245:
-----------------------------------

Hmm.  I don't see that folks often have independent binary bits of a file or a tcp dump in a file.  And I fear that folks will use this as a generic tool for reading/writing Avro data to files, which it should not be.  Am I too paranoid?

> Commandline utility for converting to and from Avro's binary format.
> --------------------------------------------------------------------
>
>                 Key: AVRO-245
>                 URL: https://issues.apache.org/jira/browse/AVRO-245
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>            Priority: Minor
>         Attachments: AVRO-245.patch.txt, AVRO-245.patch.txt
>
>
> A utility for avrotool that can convert between Avro binary data and the JSON textual form.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-245) Commandline utility for converting to and from Avro's binary format.

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784559#action_12784559 ] 

Philip Zeyliger commented on AVRO-245:
--------------------------------------

BTW, I'm annoyed at talking about it, so I'm writing the "avrocat" tool right now.

> Commandline utility for converting to and from Avro's binary format.
> --------------------------------------------------------------------
>
>                 Key: AVRO-245
>                 URL: https://issues.apache.org/jira/browse/AVRO-245
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>            Priority: Minor
>         Attachments: AVRO-245.patch.txt, AVRO-245.patch.txt
>
>
> A utility for avrotool that can convert between Avro binary data and the JSON textual form.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-245) Commandline utility for converting to and from Avro's binary format.

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784571#action_12784571 ] 

Doug Cutting commented on AVRO-245:
-----------------------------------

> BTW, I'm annoyed at talking about it, so I'm writing the "avrocat" tool right now. 

It worked!


> Commandline utility for converting to and from Avro's binary format.
> --------------------------------------------------------------------
>
>                 Key: AVRO-245
>                 URL: https://issues.apache.org/jira/browse/AVRO-245
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>            Priority: Minor
>         Attachments: AVRO-245.patch.txt, AVRO-245.patch.txt
>
>
> A utility for avrotool that can convert between Avro binary data and the JSON textual form.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-245) Commandline utility for converting to and from Avro's binary format.

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784514#action_12784514 ] 

Doug Cutting commented on AVRO-245:
-----------------------------------

> I'm just doing these one at a time.

So do you intend to extend this tool or add a new tool for that?  I think we shouldn't encourage the generation of Avro data files that are not in the Avro data file format, so I'd prefer only one tool.  Since the data file contains the schema, one's not needed on the command line.  If provided, it should be used for projection.  If reading json input, one datum per line, and no schema is provided on the command line then the first line of the file could be assumed to be the schema.

> Commandline utility for converting to and from Avro's binary format.
> --------------------------------------------------------------------
>
>                 Key: AVRO-245
>                 URL: https://issues.apache.org/jira/browse/AVRO-245
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>            Priority: Minor
>         Attachments: AVRO-245.patch.txt, AVRO-245.patch.txt
>
>
> A utility for avrotool that can convert between Avro binary data and the JSON textual form.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-245) Commandline utility for converting to and from Avro's binary format.

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Philip Zeyliger updated AVRO-245:
---------------------------------

    Attachment: AVRO-245.patch.txt

Ok, I've added "data_file_read" and "data_file_write"; also "data_file_get_schema".  I didn't do the thing where the schema is the first line (in part because I want that to be turn-offable, and we need commandline parsing for that).  I've renamed the json_to_binary tool to json_to_binary_fragment.

I added two classes to hold some utilities, Tool/Util.java, and TestingUtil.java, for tiny things that made sense.  (It's future work to use them more widely, but I didn't want to clutter this patch.)  I wanted to name the second one TestUtil, but then junit would run it.  Even so, I had to update build.xml to only include tests for filenames Test[A-Z]* (instead of Test*), but I figure that's alright.  I'm very open to other naming suggestions.

If folks prefer, I could break up the data_file and the data_fragment stuff across two JIRAs.

For now, I'm continuing to do tests both in Java and in the shell script.  That's getting a bit tiresome, admittedly.

Here's what the set of tool commands are with this change:

{noformat}
$/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home/bin/java -jar /Users/philip/src/avro/build/avroj-1.2.0-dev.jar 
Available tools:
binary_fragment_to_json  Converts binary data to JSON.
                compile  Generates Java code for the given schema.
   data_file_get_schema  Prints out schema of an Avro data file.
         data_file_read  Dumps the contents of an Avro data file as JSON, one record per line.
        data_file_write  Reads in JSON (one record per line), and writes to an avro data file.
                 induce  Use reflection to induce a schema from a class or a protocol from an interface.
json_to_binary_fragment  Converts text data to an Avro binary fragment.
{noformat}

AVRO-160 was very noticeable when I wrote "data_file_read", since you can't read an avro data file from stdin: you need something seekable.

> Commandline utility for converting to and from Avro's binary format.
> --------------------------------------------------------------------
>
>                 Key: AVRO-245
>                 URL: https://issues.apache.org/jira/browse/AVRO-245
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>            Priority: Minor
>         Attachments: AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt
>
>
> A utility for avrotool that can convert between Avro binary data and the JSON textual form.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-245) Commandline utility for converting to and from Avro's binary format.

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784458#action_12784458 ] 

Doug Cutting commented on AVRO-245:
-----------------------------------

Since there's no code in the containing class, I see no point to having it.  Rather we should just have JsonToBinaryTool and BinaryToJsonTool.

Also, these should be more symmetric, both either taking their input from a file, from the command line or from standard input.  My preference would probably be either a named file or standard in/out if that's not provided.

You're not using the container file format, but rather just a file containing a single record.  A tool that takes json lines from standard input and emits an avro data file and vice versa would be useful, no?

Finally, it doesn't look like you close the file you open for write.

> Commandline utility for converting to and from Avro's binary format.
> --------------------------------------------------------------------
>
>                 Key: AVRO-245
>                 URL: https://issues.apache.org/jira/browse/AVRO-245
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>            Priority: Minor
>         Attachments: AVRO-245.patch.txt
>
>
> A utility for avrotool that can convert between Avro binary data and the JSON textual form.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-245) Commandline utility for converting to and from Avro's binary format.

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784558#action_12784558 ] 

Philip Zeyliger commented on AVRO-245:
--------------------------------------

I'm fine with "binary_fragment_to_json".  Feel free to do the regexp on the patch.

Only time will tell, but I'm willing to bet (you know, a milkshake, perhaps) that folks will store avro data in non-condoned file formats.  Like databases (either RDBMS or key-value like BDB) or SequenceFiles, or whatever.  Someone on the user list just asked about this recently, too.

If you'd prefer to wait for a data file reader/writer to check this in, we could have had one written already :)

-- Philip

> Commandline utility for converting to and from Avro's binary format.
> --------------------------------------------------------------------
>
>                 Key: AVRO-245
>                 URL: https://issues.apache.org/jira/browse/AVRO-245
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>            Priority: Minor
>         Attachments: AVRO-245.patch.txt, AVRO-245.patch.txt
>
>
> A utility for avrotool that can convert between Avro binary data and the JSON textual form.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-245) Commandline utility for converting to and from Avro's binary format.

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Philip Zeyliger updated AVRO-245:
---------------------------------

    Status: Patch Available  (was: Open)

> Commandline utility for converting to and from Avro's binary format.
> --------------------------------------------------------------------
>
>                 Key: AVRO-245
>                 URL: https://issues.apache.org/jira/browse/AVRO-245
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>            Priority: Minor
>         Attachments: AVRO-245.patch.txt, AVRO-245.patch.txt
>
>
> A utility for avrotool that can convert between Avro binary data and the JSON textual form.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-245) Commandline utility for converting to and from Avro's binary format.

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784522#action_12784522 ] 

Philip Zeyliger commented on AVRO-245:
--------------------------------------

I intend to write a separate tool for data files.  (And, yes, that record would only need the schema for projection.)

I'm mainly interested in exposing this as (a) a way for folks to learn and understand the encoding, in a hands-on fashion, (b) a way to debug a stream of bytes that are typically part of a greater whole (one value in a sequencefile, part of a tcpdump, etc), and (c) as a step in sending arbitrary RPCs around.

> Commandline utility for converting to and from Avro's binary format.
> --------------------------------------------------------------------
>
>                 Key: AVRO-245
>                 URL: https://issues.apache.org/jira/browse/AVRO-245
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>            Priority: Minor
>         Attachments: AVRO-245.patch.txt, AVRO-245.patch.txt
>
>
> A utility for avrotool that can convert between Avro binary data and the JSON textual form.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-245) Commandline utility for converting to and from Avro's binary format.

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790536#action_12790536 ] 

Philip Zeyliger commented on AVRO-245:
--------------------------------------

bq. i think ant uses glob-like patterns, not regex. instead i just named the file AvroTestUtil.java.

Cool.  I intended [A-Z]* to be glob, not regex, and I thought it worked locally, but perhaps I was deluding myself.

bq. // FIXME: re-create encoder to avoid extra spaces (Jackson bug?)

I think this comment isn't quite enough to figure out what the fix-me is implying.  What's your TODO/FIXME convention?  (Perhaps that comment was intended for yourself, and intended to never be committed.)

I wandered into the Jackson code when I ran into this, and it's reasonably set up to write one JSON object, and we're writing many.  So I don't think they'd say it was a bug: I think they'd say we should use a different JsonGenerator for every bit of JSON we write.

{noformat}
      while (true) {
        try {
          datum = reader.read(null, decoder);
        } catch (AvroRuntimeException e) {            // FIXME: at EOF
{noformat}
It bugs me that this works.  The example it ought to fail is (json-data) "1 2 3\n" (note: no newlines between records) against schema "int".  This ought to throw an error.

The core issue is that we've got two different things going on: we're both line-oriented and JSON-oriented.  We should check that the JSON on every line is well-formed, and the code fails to.  (My original code was broken too: when I wrote the test, it didn't throw an error for the malformed data; just read one entry and went on; also StringInputStream was from ant, which shouldn't even be on avroj's classpath.)

One way to avoid this mess is to require that the input file be a JSON array.  So "[1, 2, 3]" (with arbitrary whitespace).  I think this makes it harder to use line-oriented unix tools with this, but it does solve both problems.  What do you think?  

It also worries me every time JsonDecoder calls "in.nextToken();" without checking that the value it got was expected (typically "null" or possibly END_ARRAY or END_OBJECT).  It doesn't seem that using the ValidatingDecoder makes it check that, but i could be wrong.



> Commandline utility for converting to and from Avro's binary format.
> --------------------------------------------------------------------
>
>                 Key: AVRO-245
>                 URL: https://issues.apache.org/jira/browse/AVRO-245
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>            Priority: Minor
>         Attachments: AVRO-245.patch, AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt
>
>
> A utility for avrotool that can convert between Avro binary data and the JSON textual form.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-245) Commandline utility for converting to and from Avro's binary format.

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785652#action_12785652 ] 

Doug Cutting commented on AVRO-245:
-----------------------------------

I think you attached the old version of the patch.

> Commandline utility for converting to and from Avro's binary format.
> --------------------------------------------------------------------
>
>                 Key: AVRO-245
>                 URL: https://issues.apache.org/jira/browse/AVRO-245
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>            Priority: Minor
>         Attachments: AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt
>
>
> A utility for avrotool that can convert between Avro binary data and the JSON textual form.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-245) Commandline utility for converting to and from Avro's binary format.

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-245:
------------------------------

    Attachment: AVRO-245.patch

Here's a version of this that, instead of adding a checkEof() call in each method prior to the call to parser.advance(), adds an advance() method that checks for EOF and then calls parser.advance().

> Commandline utility for converting to and from Avro's binary format.
> --------------------------------------------------------------------
>
>                 Key: AVRO-245
>                 URL: https://issues.apache.org/jira/browse/AVRO-245
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>            Priority: Minor
>         Attachments: AVRO-245.patch, AVRO-245.patch, AVRO-245.patch, AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt
>
>
> A utility for avrotool that can convert between Avro binary data and the JSON textual form.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-245) Commandline utility for converting to and from Avro's binary format.

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784501#action_12784501 ] 

Philip Zeyliger commented on AVRO-245:
--------------------------------------

bq. Since there's no code in the containing class, I see no point to having it. Rather we should just have JsonToBinaryTool and BinaryToJsonTool. 

Done.  I kept both tests in the same class, because it was convenient to share constants.

bq. Also, these should be more symmetric, both either taking their input from a file, from the command line or from standard input. My preference would probably be either a named file or standard in/out if that's not provided.

Changed so that it uses a named file, and "-" implies stdin.  Agreed that the symmetric form looks better.

bq. You're not using the container file format, but rather just a file containing a single record. A tool that takes json lines from standard input and emits an avro data file and vice versa would be useful, no?

Absolutely.  I'm just doing these one at a time.

bq. Finally, it doesn't look like you close the file you open for write.

Good catch: I didn't close the file I opened for _read_.  I don't think one is supposed to close System.out and System.in, so I ended up having to bifurcate the code there.

> Commandline utility for converting to and from Avro's binary format.
> --------------------------------------------------------------------
>
>                 Key: AVRO-245
>                 URL: https://issues.apache.org/jira/browse/AVRO-245
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>            Priority: Minor
>         Attachments: AVRO-245.patch.txt
>
>
> A utility for avrotool that can convert between Avro binary data and the JSON textual form.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-245) Commandline utility for converting to and from Avro's binary format.

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Philip Zeyliger updated AVRO-245:
---------------------------------

    Attachment: AVRO-245.patch.txt

Attaching a patch.  I put both the "to" and the "from" tools as inner classes of the same outer class.  I could be convinced that they should just be top-level classes.

> Commandline utility for converting to and from Avro's binary format.
> --------------------------------------------------------------------
>
>                 Key: AVRO-245
>                 URL: https://issues.apache.org/jira/browse/AVRO-245
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>            Priority: Minor
>         Attachments: AVRO-245.patch.txt
>
>
> A utility for avrotool that can convert between Avro binary data and the JSON textual form.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-245) Commandline utility for converting to and from Avro's binary format.

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-245:
------------------------------

       Resolution: Fixed
    Fix Version/s: 1.3.0
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

I just committed this.  Thanks, Philip!

> Commandline utility for converting to and from Avro's binary format.
> --------------------------------------------------------------------
>
>                 Key: AVRO-245
>                 URL: https://issues.apache.org/jira/browse/AVRO-245
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>            Priority: Minor
>             Fix For: 1.3.0
>
>         Attachments: AVRO-245.patch, AVRO-245.patch, AVRO-245.patch, AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt
>
>
> A utility for avrotool that can convert between Avro binary data and the JSON textual form.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-245) Commandline utility for converting to and from Avro's binary format.

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784539#action_12784539 ] 

Philip Zeyliger commented on AVRO-245:
--------------------------------------

bq. Am I too paranoid?

Yes?

As you've said, the essential kernel of Avro is to read/write binary data based on a json schema.  That's exposed enough in the Java APIs, and there's little reason not to expose it on the command line.

I understand that you're concerned about people leaving schemaless files around.  We can add a warning in the description, perhaps.  Because the tool works on individual records only, I don't think folks will shoot themselves in the foot too much.

-- Philip

> Commandline utility for converting to and from Avro's binary format.
> --------------------------------------------------------------------
>
>                 Key: AVRO-245
>                 URL: https://issues.apache.org/jira/browse/AVRO-245
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>            Priority: Minor
>         Attachments: AVRO-245.patch.txt, AVRO-245.patch.txt
>
>
> A utility for avrotool that can convert between Avro binary data and the JSON textual form.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-245) Commandline utility for converting to and from Avro's binary format.

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784549#action_12784549 ] 

Doug Cutting commented on AVRO-245:
-----------------------------------

My paranoia aside, I have a much harder time time imagining use cases for this than I do for a data file tool.  I am not in the habit of grabbing random binary portions of files or packets and trying to decode them, nor can I imagine that others are.  I am interested in rendering a binary data file as text.  I think the tool named binary_to_json should operate on data files.  Perhaps we might support a binary_data_fragment_to_json, but I'd much rather see the data file utility first.  As you mention, if folks really have a binary fragment, they can use the java or some other API, but I don't see a command-line shell tool use case for this.

> Commandline utility for converting to and from Avro's binary format.
> --------------------------------------------------------------------
>
>                 Key: AVRO-245
>                 URL: https://issues.apache.org/jira/browse/AVRO-245
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>            Priority: Minor
>         Attachments: AVRO-245.patch.txt, AVRO-245.patch.txt
>
>
> A utility for avrotool that can convert between Avro binary data and the JSON textual form.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-245) Commandline utility for converting to and from Avro's binary format.

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Philip Zeyliger updated AVRO-245:
---------------------------------

    Attachment: AVRO-245.patch.txt

> Commandline utility for converting to and from Avro's binary format.
> --------------------------------------------------------------------
>
>                 Key: AVRO-245
>                 URL: https://issues.apache.org/jira/browse/AVRO-245
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>            Priority: Minor
>         Attachments: AVRO-245.patch.txt, AVRO-245.patch.txt
>
>
> A utility for avrotool that can convert between Avro binary data and the JSON textual form.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-245) Commandline utility for converting to and from Avro's binary format.

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793388#action_12793388 ] 

Philip Zeyliger commented on AVRO-245:
--------------------------------------

Ping?

> Commandline utility for converting to and from Avro's binary format.
> --------------------------------------------------------------------
>
>                 Key: AVRO-245
>                 URL: https://issues.apache.org/jira/browse/AVRO-245
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>            Priority: Minor
>         Attachments: AVRO-245.patch, AVRO-245.patch, AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt
>
>
> A utility for avrotool that can convert between Avro binary data and the JSON textual form.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-245) Commandline utility for converting to and from Avro's binary format.

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795491#action_12795491 ] 

Philip Zeyliger commented on AVRO-245:
--------------------------------------

+1.  Works for me.

> Commandline utility for converting to and from Avro's binary format.
> --------------------------------------------------------------------
>
>                 Key: AVRO-245
>                 URL: https://issues.apache.org/jira/browse/AVRO-245
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>            Priority: Minor
>         Attachments: AVRO-245.patch, AVRO-245.patch, AVRO-245.patch, AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt
>
>
> A utility for avrotool that can convert between Avro binary data and the JSON textual form.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-245) Commandline utility for converting to and from Avro's binary format.

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-245:
------------------------------

    Attachment: AVRO-245.patch

I hacked on this patch some last week.  Changes I made:
 - the build.xml change you made didn't work: i think ant uses glob-like patterns, not regex.  instead i just named the file AvroTestUtil.java.
 - i used more generic types for local variables when possible, e.g., Encoder rather than JsonEncoder, etc.
 - the NullDatumReader in GetSchemaTool wasn't needed. GenericDatumReader could be used.
 - tried hard to not create new json decoder or encoders per line in DataFile read/write tools since these are not lightweight.  in the reader we need a reliable precise way of detecting EOF from an decoder.  perhaps it should be fixed to throw EOFException, or perhaps the tool should keep track of its position in the input and stop when it reaches the end.  in the writer there seems to be a bug in Jackson that emits an extra space.  we should pursue this and figure out what's going on, since we need these tools to perform reasonably well.  the patch attached passes tests and creates fewer objects, but this area still needs more work i fear.

> Commandline utility for converting to and from Avro's binary format.
> --------------------------------------------------------------------
>
>                 Key: AVRO-245
>                 URL: https://issues.apache.org/jira/browse/AVRO-245
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>            Priority: Minor
>         Attachments: AVRO-245.patch, AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt
>
>
> A utility for avrotool that can convert between Avro binary data and the JSON textual form.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-245) Commandline utility for converting to and from Avro's binary format.

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-245:
------------------------------

    Attachment: AVRO-245.patch

Here's a new version that fixes things so that we can catch EOFException rather than AvroTypeException when attempting to read a JsonDecoder past EOF.

> Commandline utility for converting to and from Avro's binary format.
> --------------------------------------------------------------------
>
>                 Key: AVRO-245
>                 URL: https://issues.apache.org/jira/browse/AVRO-245
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>            Priority: Minor
>         Attachments: AVRO-245.patch, AVRO-245.patch, AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt
>
>
> A utility for avrotool that can convert between Avro binary data and the JSON textual form.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-245) Commandline utility for converting to and from Avro's binary format.

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Philip Zeyliger updated AVRO-245:
---------------------------------

    Attachment: AVRO-245.patch.txt

AVRO-263 required minor modifications (adding return values) to this patch.  Trivial update to the patch.

(For my own confusion-keeping, this is d6d16d19ab07dbc167797eea05c3b2d09c740a10).

> Commandline utility for converting to and from Avro's binary format.
> --------------------------------------------------------------------
>
>                 Key: AVRO-245
>                 URL: https://issues.apache.org/jira/browse/AVRO-245
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>            Priority: Minor
>         Attachments: AVRO-245.patch, AVRO-245.patch, AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt
>
>
> A utility for avrotool that can convert between Avro binary data and the JSON textual form.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-245) Commandline utility for converting to and from Avro's binary format.

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Philip Zeyliger updated AVRO-245:
---------------------------------

    Attachment: AVRO-245.patch.txt

Attaching a new version of the patch.

The utilities have been given shorter names (see below).  

The debate about how to deal with newlines actually had an elegant solution: we simply read in JSON records, no matter how they're delimited.  So "1 2 3" and "1\n2\n3\n" would be read in the same way.  Newline is still standard (and used by the "tojson" tool), but no need to special case it.  Furthermore. parser.init() is called at every record to reset the JsonGenerator, which was causing extra spaces to occur.

This patch also inserted a bunch of checkEof() methods in JsonParser.  Ideally those wouldn't need to be at every line, but, practically, it turns out that parser.advance() calls back into JsonParser, which calls in.nextToken().  An alternative is to just ignore partial records at the end of files.  That has fewer ugly lines.

{noformat}
Available tools:
   compile  Generates Java code for the given schema.
fragtojson  Converts binary data to JSON.
  fromjson  Reads in JSON (one record per line), and writes to an avro data file.
 getschema  Prints out schema of an Avro data file.
    induce  Use reflection to induce a schema from a class or a protocol from an interface.
jsontofrag  Converts text data to an Avro binary fragment.
    tojson  Dumps the contents of an Avro data file as JSON, one record per line.
{noformat}

> Commandline utility for converting to and from Avro's binary format.
> --------------------------------------------------------------------
>
>                 Key: AVRO-245
>                 URL: https://issues.apache.org/jira/browse/AVRO-245
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>            Priority: Minor
>         Attachments: AVRO-245.patch, AVRO-245.patch, AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt
>
>
> A utility for avrotool that can convert between Avro binary data and the JSON textual form.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-245) Commandline utility for converting to and from Avro's binary format.

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790857#action_12790857 ] 

Doug Cutting commented on AVRO-245:
-----------------------------------

> What's your TODO/FIXME convention?

I don't have a strong convention.  Do you?

What I meant here is that creating a new encoder per datum is unacceptable.  A JsonEncoder compiles the schema as a grammar and is meant to be reused.  Jackson's JsonGenerator is reusable, but unfortunately inserts a space before all but the first item for some unknown reason that is at least a misfeature for our purposes.  Looking at the thrift-protobuf-compare benchmarks, they do create a new JsonGenerator per datum, so they must be lightweight.  But we still need to avoid re-compiling the grammer per datum.

> The core issue is that we've got two different things going on: we're both line-oriented and JSON-oriented.

You're right.  I munged this together.

So we perhaps should consider lines the container, parsing them first, then parsing json within them, as your patch did.  But we should not create a new Decoder per line, since it also compiles the grammar.

To address both of these, perhaps we should add methods:

static Parser JsonEncoder#parse(Schema);
JsonEncoder(Parser, OutputStream);
static Parser JsonDecoder#parse(Schema);
JsonDecoder(Parser, InputStream);

Then we could create the parser once outside the loop and then re-create lightweight objects within the loop and hope that doesn't hurt performance much. My first choice would be to make encoders and decoders reusable, but that does not appear possible currently with Jackson.

> It doesn't seem that using the ValidatingDecoder makes it check that, but i could be wrong.

I believe that the Json tokens are actually strictly checked.


> Commandline utility for converting to and from Avro's binary format.
> --------------------------------------------------------------------
>
>                 Key: AVRO-245
>                 URL: https://issues.apache.org/jira/browse/AVRO-245
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>            Priority: Minor
>         Attachments: AVRO-245.patch, AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt
>
>
> A utility for avrotool that can convert between Avro binary data and the JSON textual form.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-245) Commandline utility for converting to and from Avro's binary format.

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Philip Zeyliger updated AVRO-245:
---------------------------------

    Attachment: AVRO-245.patch.txt

Whoops, got my JIRA numbers confused.  Here's hopefully the updated version.

> Commandline utility for converting to and from Avro's binary format.
> --------------------------------------------------------------------
>
>                 Key: AVRO-245
>                 URL: https://issues.apache.org/jira/browse/AVRO-245
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>            Priority: Minor
>         Attachments: AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt, AVRO-245.patch.txt
>
>
> A utility for avrotool that can convert between Avro binary data and the JSON textual form.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.