You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Yang Yang (Created) (JIRA)" <ji...@apache.org> on 2011/10/04 20:41:34 UTC

[jira] [Created] (AVRO-905) make default separator in jsonEncoder to be "\n" instead of " "

make default separator in jsonEncoder to be "\n" instead of " "
---------------------------------------------------------------

                 Key: AVRO-905
                 URL: https://issues.apache.org/jira/browse/AVRO-905
             Project: Avro
          Issue Type: Improvement
            Reporter: Yang Yang
            Priority: Minor


from mailing list:

if I do

writer = new SpecificDatumWriter<SpecificRecord>(schema);
encoder = EncoderFactory.get().jsonEncoder(schema, ostream);

writer.write(my_specific_record, encoder);
writer.write(my_specific_record.encoder);


it adds a space " " between the 2 records, I guess for separation.
is it possible to remove that? or changing that to "\n" is much better



Doug said:
......
or you could pass a JsonGenerator to EncoderFactory#jsonEncoder a
MinimalPrettyPrinter whose rootValueSeparator is set to "\n".

http://jackson.codehaus.org/1.8.4/javadoc/org/codehaus/jackson/util/MinimalPrettyPrinter.html


+++ lang/java/avro/src/main/java/org/apache/avro/io/JsonEncoder.java
(working copy)
@@ -31,6 +31,7 @@
 import org.codehaus.jackson.JsonEncoding;
 import org.codehaus.jackson.JsonFactory;
 import org.codehaus.jackson.JsonGenerator;
+import org.codehaus.jackson.util.MinimalPrettyPrinter;

 /** An {@link Encoder} for Avro's JSON data encoding.
 * </p>
@@ -67,11 +68,17 @@
    }
  }

+  // by default, one object per line
  private static JsonGenerator getJsonGenerator(OutputStream out)
      throws IOException {
    if (null == out)
      throw new NullPointerException("OutputStream cannot be null");
-    return new JsonFactory().createJsonGenerator(out, JsonEncoding.UTF8);
+    JsonGenerator g
+      = new JsonFactory().createJsonGenerator(out, JsonEncoding.UTF8);
+    MinimalPrettyPrinter pp = new MinimalPrettyPrinter();
+    pp.setRootValueSeparator("\n");
+    g.setPrettyPrinter(pp);
+    return g;
  }


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (AVRO-905) make default separator in jsonEncoder to be "\n" instead of " "

Posted by "Doug Cutting (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-905:
------------------------------

    Attachment: AVRO-905.patch

Here's the patch.
                
> make default separator in jsonEncoder to be "\n" instead of " "
> ---------------------------------------------------------------
>
>                 Key: AVRO-905
>                 URL: https://issues.apache.org/jira/browse/AVRO-905
>             Project: Avro
>          Issue Type: Improvement
>            Reporter: Yang Yang
>            Priority: Minor
>         Attachments: AVRO-905.patch
>
>
> from mailing list:
> if I do
> writer = new SpecificDatumWriter<SpecificRecord>(schema);
> encoder = EncoderFactory.get().jsonEncoder(schema, ostream);
> writer.write(my_specific_record, encoder);
> writer.write(my_specific_record.encoder);
> it adds a space " " between the 2 records, I guess for separation.
> is it possible to remove that? or changing that to "\n" is much better
> Doug said:
> ......
> or you could pass a JsonGenerator to EncoderFactory#jsonEncoder a
> MinimalPrettyPrinter whose rootValueSeparator is set to "\n".
> http://jackson.codehaus.org/1.8.4/javadoc/org/codehaus/jackson/util/MinimalPrettyPrinter.html
> +++ lang/java/avro/src/main/java/org/apache/avro/io/JsonEncoder.java
> (working copy)
> @@ -31,6 +31,7 @@
>  import org.codehaus.jackson.JsonEncoding;
>  import org.codehaus.jackson.JsonFactory;
>  import org.codehaus.jackson.JsonGenerator;
> +import org.codehaus.jackson.util.MinimalPrettyPrinter;
>  /** An {@link Encoder} for Avro's JSON data encoding.
>  * </p>
> @@ -67,11 +68,17 @@
>     }
>   }
> +  // by default, one object per line
>   private static JsonGenerator getJsonGenerator(OutputStream out)
>       throws IOException {
>     if (null == out)
>       throw new NullPointerException("OutputStream cannot be null");
> -    return new JsonFactory().createJsonGenerator(out, JsonEncoding.UTF8);
> +    JsonGenerator g
> +      = new JsonFactory().createJsonGenerator(out, JsonEncoding.UTF8);
> +    MinimalPrettyPrinter pp = new MinimalPrettyPrinter();
> +    pp.setRootValueSeparator("\n");
> +    g.setPrettyPrinter(pp);
> +    return g;
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (AVRO-905) make default separator in jsonEncoder to be "\n" instead of " "

Posted by "Doug Cutting (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-905:
------------------------------

      Resolution: Fixed
    Hadoop Flags: Reviewed
          Status: Resolved  (was: Patch Available)

Thiru, thanks for looking at this.  I changed it to use System.getProperty("line.separator") instead of "\n" since that's equivalent to the println() in DataFileReader that this replaces.

I committed this.
                
> make default separator in jsonEncoder to be "\n" instead of " "
> ---------------------------------------------------------------
>
>                 Key: AVRO-905
>                 URL: https://issues.apache.org/jira/browse/AVRO-905
>             Project: Avro
>          Issue Type: Improvement
>            Reporter: Yang Yang
>            Assignee: Doug Cutting
>            Priority: Minor
>             Fix For: 1.6.0
>
>         Attachments: AVRO-905.patch, AVRO-905.patch, AVRO-905.patch
>
>
> from mailing list:
> if I do
> writer = new SpecificDatumWriter<SpecificRecord>(schema);
> encoder = EncoderFactory.get().jsonEncoder(schema, ostream);
> writer.write(my_specific_record, encoder);
> writer.write(my_specific_record.encoder);
> it adds a space " " between the 2 records, I guess for separation.
> is it possible to remove that? or changing that to "\n" is much better
> Doug said:
> ......
> or you could pass a JsonGenerator to EncoderFactory#jsonEncoder a
> MinimalPrettyPrinter whose rootValueSeparator is set to "\n".
> http://jackson.codehaus.org/1.8.4/javadoc/org/codehaus/jackson/util/MinimalPrettyPrinter.html
> +++ lang/java/avro/src/main/java/org/apache/avro/io/JsonEncoder.java
> (working copy)
> @@ -31,6 +31,7 @@
>  import org.codehaus.jackson.JsonEncoding;
>  import org.codehaus.jackson.JsonFactory;
>  import org.codehaus.jackson.JsonGenerator;
> +import org.codehaus.jackson.util.MinimalPrettyPrinter;
>  /** An {@link Encoder} for Avro's JSON data encoding.
>  * </p>
> @@ -67,11 +68,17 @@
>     }
>   }
> +  // by default, one object per line
>   private static JsonGenerator getJsonGenerator(OutputStream out)
>       throws IOException {
>     if (null == out)
>       throw new NullPointerException("OutputStream cannot be null");
> -    return new JsonFactory().createJsonGenerator(out, JsonEncoding.UTF8);
> +    JsonGenerator g
> +      = new JsonFactory().createJsonGenerator(out, JsonEncoding.UTF8);
> +    MinimalPrettyPrinter pp = new MinimalPrettyPrinter();
> +    pp.setRootValueSeparator("\n");
> +    g.setPrettyPrinter(pp);
> +    return g;
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (AVRO-905) make default separator in jsonEncoder to be "\n" instead of " "

Posted by "Doug Cutting (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-905:
------------------------------

    Fix Version/s: 1.6.0
         Assignee: Doug Cutting
           Status: Patch Available  (was: Open)

I'll commit this soon unless there are objections.
                
> make default separator in jsonEncoder to be "\n" instead of " "
> ---------------------------------------------------------------
>
>                 Key: AVRO-905
>                 URL: https://issues.apache.org/jira/browse/AVRO-905
>             Project: Avro
>          Issue Type: Improvement
>            Reporter: Yang Yang
>            Assignee: Doug Cutting
>            Priority: Minor
>             Fix For: 1.6.0
>
>         Attachments: AVRO-905.patch, AVRO-905.patch
>
>
> from mailing list:
> if I do
> writer = new SpecificDatumWriter<SpecificRecord>(schema);
> encoder = EncoderFactory.get().jsonEncoder(schema, ostream);
> writer.write(my_specific_record, encoder);
> writer.write(my_specific_record.encoder);
> it adds a space " " between the 2 records, I guess for separation.
> is it possible to remove that? or changing that to "\n" is much better
> Doug said:
> ......
> or you could pass a JsonGenerator to EncoderFactory#jsonEncoder a
> MinimalPrettyPrinter whose rootValueSeparator is set to "\n".
> http://jackson.codehaus.org/1.8.4/javadoc/org/codehaus/jackson/util/MinimalPrettyPrinter.html
> +++ lang/java/avro/src/main/java/org/apache/avro/io/JsonEncoder.java
> (working copy)
> @@ -31,6 +31,7 @@
>  import org.codehaus.jackson.JsonEncoding;
>  import org.codehaus.jackson.JsonFactory;
>  import org.codehaus.jackson.JsonGenerator;
> +import org.codehaus.jackson.util.MinimalPrettyPrinter;
>  /** An {@link Encoder} for Avro's JSON data encoding.
>  * </p>
> @@ -67,11 +68,17 @@
>     }
>   }
> +  // by default, one object per line
>   private static JsonGenerator getJsonGenerator(OutputStream out)
>       throws IOException {
>     if (null == out)
>       throw new NullPointerException("OutputStream cannot be null");
> -    return new JsonFactory().createJsonGenerator(out, JsonEncoding.UTF8);
> +    JsonGenerator g
> +      = new JsonFactory().createJsonGenerator(out, JsonEncoding.UTF8);
> +    MinimalPrettyPrinter pp = new MinimalPrettyPrinter();
> +    pp.setRootValueSeparator("\n");
> +    g.setPrettyPrinter(pp);
> +    return g;
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-905) make default separator in jsonEncoder to be "\n" instead of " "

Posted by "Thiruvalluvan M. G. (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120669#comment-13120669 ] 

Thiruvalluvan M. G. commented on AVRO-905:
------------------------------------------

Looks good to me. +1

Instead of using "\n", should we use platform's line separator (System property line.separator) ? One can argue either way. The outgoing code used println(), which was inserting platform specific line separator so using it now also looks appropriate. On the other hand, since the produced avro content should be platform independent, we should not use platform's line separator. I'm fine with either.
                
> make default separator in jsonEncoder to be "\n" instead of " "
> ---------------------------------------------------------------
>
>                 Key: AVRO-905
>                 URL: https://issues.apache.org/jira/browse/AVRO-905
>             Project: Avro
>          Issue Type: Improvement
>            Reporter: Yang Yang
>            Assignee: Doug Cutting
>            Priority: Minor
>             Fix For: 1.6.0
>
>         Attachments: AVRO-905.patch, AVRO-905.patch, AVRO-905.patch
>
>
> from mailing list:
> if I do
> writer = new SpecificDatumWriter<SpecificRecord>(schema);
> encoder = EncoderFactory.get().jsonEncoder(schema, ostream);
> writer.write(my_specific_record, encoder);
> writer.write(my_specific_record.encoder);
> it adds a space " " between the 2 records, I guess for separation.
> is it possible to remove that? or changing that to "\n" is much better
> Doug said:
> ......
> or you could pass a JsonGenerator to EncoderFactory#jsonEncoder a
> MinimalPrettyPrinter whose rootValueSeparator is set to "\n".
> http://jackson.codehaus.org/1.8.4/javadoc/org/codehaus/jackson/util/MinimalPrettyPrinter.html
> +++ lang/java/avro/src/main/java/org/apache/avro/io/JsonEncoder.java
> (working copy)
> @@ -31,6 +31,7 @@
>  import org.codehaus.jackson.JsonEncoding;
>  import org.codehaus.jackson.JsonFactory;
>  import org.codehaus.jackson.JsonGenerator;
> +import org.codehaus.jackson.util.MinimalPrettyPrinter;
>  /** An {@link Encoder} for Avro's JSON data encoding.
>  * </p>
> @@ -67,11 +68,17 @@
>     }
>   }
> +  // by default, one object per line
>   private static JsonGenerator getJsonGenerator(OutputStream out)
>       throws IOException {
>     if (null == out)
>       throw new NullPointerException("OutputStream cannot be null");
> -    return new JsonFactory().createJsonGenerator(out, JsonEncoding.UTF8);
> +    JsonGenerator g
> +      = new JsonFactory().createJsonGenerator(out, JsonEncoding.UTF8);
> +    MinimalPrettyPrinter pp = new MinimalPrettyPrinter();
> +    pp.setRootValueSeparator("\n");
> +    g.setPrettyPrinter(pp);
> +    return g;
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (AVRO-905) make default separator in jsonEncoder to be "\n" instead of " "

Posted by "Doug Cutting (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-905:
------------------------------

    Attachment: AVRO-905.patch

New version of patch that includes a test.
                
> make default separator in jsonEncoder to be "\n" instead of " "
> ---------------------------------------------------------------
>
>                 Key: AVRO-905
>                 URL: https://issues.apache.org/jira/browse/AVRO-905
>             Project: Avro
>          Issue Type: Improvement
>            Reporter: Yang Yang
>            Priority: Minor
>             Fix For: 1.6.0
>
>         Attachments: AVRO-905.patch, AVRO-905.patch
>
>
> from mailing list:
> if I do
> writer = new SpecificDatumWriter<SpecificRecord>(schema);
> encoder = EncoderFactory.get().jsonEncoder(schema, ostream);
> writer.write(my_specific_record, encoder);
> writer.write(my_specific_record.encoder);
> it adds a space " " between the 2 records, I guess for separation.
> is it possible to remove that? or changing that to "\n" is much better
> Doug said:
> ......
> or you could pass a JsonGenerator to EncoderFactory#jsonEncoder a
> MinimalPrettyPrinter whose rootValueSeparator is set to "\n".
> http://jackson.codehaus.org/1.8.4/javadoc/org/codehaus/jackson/util/MinimalPrettyPrinter.html
> +++ lang/java/avro/src/main/java/org/apache/avro/io/JsonEncoder.java
> (working copy)
> @@ -31,6 +31,7 @@
>  import org.codehaus.jackson.JsonEncoding;
>  import org.codehaus.jackson.JsonFactory;
>  import org.codehaus.jackson.JsonGenerator;
> +import org.codehaus.jackson.util.MinimalPrettyPrinter;
>  /** An {@link Encoder} for Avro's JSON data encoding.
>  * </p>
> @@ -67,11 +68,17 @@
>     }
>   }
> +  // by default, one object per line
>   private static JsonGenerator getJsonGenerator(OutputStream out)
>       throws IOException {
>     if (null == out)
>       throw new NullPointerException("OutputStream cannot be null");
> -    return new JsonFactory().createJsonGenerator(out, JsonEncoding.UTF8);
> +    JsonGenerator g
> +      = new JsonFactory().createJsonGenerator(out, JsonEncoding.UTF8);
> +    MinimalPrettyPrinter pp = new MinimalPrettyPrinter();
> +    pp.setRootValueSeparator("\n");
> +    g.setPrettyPrinter(pp);
> +    return g;
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (AVRO-905) make default separator in jsonEncoder to be "\n" instead of " "

Posted by "Doug Cutting (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-905:
------------------------------

    Attachment: AVRO-905.patch

Updated version that also simplifies DataFileReadTool, since it must no longer flush after each item written and insert a newline.
                
> make default separator in jsonEncoder to be "\n" instead of " "
> ---------------------------------------------------------------
>
>                 Key: AVRO-905
>                 URL: https://issues.apache.org/jira/browse/AVRO-905
>             Project: Avro
>          Issue Type: Improvement
>            Reporter: Yang Yang
>            Assignee: Doug Cutting
>            Priority: Minor
>             Fix For: 1.6.0
>
>         Attachments: AVRO-905.patch, AVRO-905.patch, AVRO-905.patch
>
>
> from mailing list:
> if I do
> writer = new SpecificDatumWriter<SpecificRecord>(schema);
> encoder = EncoderFactory.get().jsonEncoder(schema, ostream);
> writer.write(my_specific_record, encoder);
> writer.write(my_specific_record.encoder);
> it adds a space " " between the 2 records, I guess for separation.
> is it possible to remove that? or changing that to "\n" is much better
> Doug said:
> ......
> or you could pass a JsonGenerator to EncoderFactory#jsonEncoder a
> MinimalPrettyPrinter whose rootValueSeparator is set to "\n".
> http://jackson.codehaus.org/1.8.4/javadoc/org/codehaus/jackson/util/MinimalPrettyPrinter.html
> +++ lang/java/avro/src/main/java/org/apache/avro/io/JsonEncoder.java
> (working copy)
> @@ -31,6 +31,7 @@
>  import org.codehaus.jackson.JsonEncoding;
>  import org.codehaus.jackson.JsonFactory;
>  import org.codehaus.jackson.JsonGenerator;
> +import org.codehaus.jackson.util.MinimalPrettyPrinter;
>  /** An {@link Encoder} for Avro's JSON data encoding.
>  * </p>
> @@ -67,11 +68,17 @@
>     }
>   }
> +  // by default, one object per line
>   private static JsonGenerator getJsonGenerator(OutputStream out)
>       throws IOException {
>     if (null == out)
>       throw new NullPointerException("OutputStream cannot be null");
> -    return new JsonFactory().createJsonGenerator(out, JsonEncoding.UTF8);
> +    JsonGenerator g
> +      = new JsonFactory().createJsonGenerator(out, JsonEncoding.UTF8);
> +    MinimalPrettyPrinter pp = new MinimalPrettyPrinter();
> +    pp.setRootValueSeparator("\n");
> +    g.setPrettyPrinter(pp);
> +    return g;
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira