You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Christophe Taton (JIRA)" <ji...@apache.org> on 2012/05/31 01:23:23 UTC

[jira] [Created] (AVRO-1105) Scala API for Avro

Christophe Taton created AVRO-1105:
--------------------------------------

             Summary: Scala API for Avro
                 Key: AVRO-1105
                 URL: https://issues.apache.org/jira/browse/AVRO-1105
             Project: Avro
          Issue Type: New Feature
            Reporter: Christophe Taton


Umbrella issue.
Goal is to provide Scala friendly APIs for Avro records and protocols (RPCs).

Related project: http://code.google.com/p/avro-scala-compiler-plugin/ looks dead (no change since Sep 2010).


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-1105) Scala API for Avro

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505952#comment-13505952 ] 

Scott Carey commented on AVRO-1105:
-----------------------------------

@Quinn: based on my glance on github, it seems that the implementation is code-gen based and wraps the existing Java implementation for most of its work.  Is that correct?  That is fine, code gen is a common use case with Avro (along with two other common cases, I'll discuss shortly). As you indicate there are Scala devs who would like to use it.  We don't have to start out with all use cases available, or with a pure Scala implementation.

h6. Common Avro use patterns
There are three common patterns for interacting with Avro data from code:
* "Schema First" (e.g. code gen) : Schemas are managed outside of the code, and shared across products / languages.  These generally represent business objects and result in pure-data classes available to the programmer.
* "Code First" (e.g. reflection) : The canonical representation for data is in code, and Avro schemas are generated based on that code for persistence of data and schema evolution.
* "Dynamic" (e.g. Java generic API) : Code has no a priori knowledge of schemas and programs interpret Avro data dynamically based on inputs or directives.  

- Schema first patterns work well with long-lived data types and applications built to directly work with those data types, or exchange them with other applications.  These applications often want to expose the data types to the programmer directly (e.g. make record 'Foo' appear as class "Foo" with field accessors name the same as the fields for compile time safety).
- Code first patterns have low programming overhead and fit well with agile use cases, prototypes, or situations where a single language can host the canonical representation of a long living data type.
- Dynamic patterns are required for general data processing and storage, generic data access and transformation tools, or any other use case where a priori knowledge of the schemas passing through the system by the programmer is impossible or a burden.  

If this patch only addresses one of the three use cases, that is OK with me, we simply need to be clear what it does not do, and encourage others to contribute work that completes other use cases.  This is really a Scala code gen wrapper around the Java implementation, we need it to be clear that this is not a full language implementation -- maybe it is simply a module within the Java implementation.  
On the other hand, if there are ways to improve this work and achieve the same use cases then that is something to consider now, especially if it improves buy-in from the Scala community.

Typically, once a language has all three use case types, much of the implementation overlaps on the back-end.

@John:  This patch does not appear to address the dynamic use cases where macros and type level programming would really shine, nor any code first style.  That would require a different contribution effort.  However, for a schema first style, are Scala 2.10 macros truly an alternative to code generation?  I believe they can generate classes conforming to types defined at compile time from a schema, but are they powerful enough to inject type and field names that correspond to the schema record and field names?  I want to make sure we are talking about solving the same use cases.  
On the idiomatic Scala objection, I see a few things in the implementation that are a result of using the Avro Java implementation's APIs for encoding, decoding, and Schemas; changing that does not make sense for a Scala wrapper around the Java API, I am more concerned about things that are exposed to users.
                
> Scala API for Avro
> ------------------
>
>                 Key: AVRO-1105
>                 URL: https://issues.apache.org/jira/browse/AVRO-1105
>             Project: Avro
>          Issue Type: New Feature
>            Reporter: Christophe Taton
>         Attachments: avro-scala.patch
>
>
> Umbrella issue.
> Goal is to provide Scala friendly APIs for Avro records and protocols (RPCs).
> Related project: http://code.google.com/p/avro-scala-compiler-plugin/ looks dead (no change since Sep 2010).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1105) Scala API for Avro

Posted by "Quinn Slack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428552#comment-13428552 ] 

Quinn Slack commented on AVRO-1105:
-----------------------------------

I'm not an Avro committer, but I'll try out this patch and post feedback/patches. My company has several Scala developers who use Avro quite heavily, and we'd love to see native Scala codegen for Avro.
                
> Scala API for Avro
> ------------------
>
>                 Key: AVRO-1105
>                 URL: https://issues.apache.org/jira/browse/AVRO-1105
>             Project: Avro
>          Issue Type: New Feature
>            Reporter: Christophe Taton
>         Attachments: avro-scala.patch
>
>
> Umbrella issue.
> Goal is to provide Scala friendly APIs for Avro records and protocols (RPCs).
> Related project: http://code.google.com/p/avro-scala-compiler-plugin/ looks dead (no change since Sep 2010).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-1105) Scala API for Avro

Posted by "John A. De Goes (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505621#comment-13505621 ] 

John A. De Goes commented on AVRO-1105:
---------------------------------------

I have reviewed the Scala patch. Although this would provide some Avro functionality for Scala, I cannot recommend the patch, because (1) it's not idiomatic Scala, (2) it uses code generation when there are far better facilities for providing the same functionality (2.10 macros or type-level programming), and (3) I cannot see the Scala community embracing this as an officially sanctioned means of Scala-Avro interop.

I know Avro needs to have broad language support, but before Avro adds functionality for some language community, I think it's essentially to get buy-in from that community. I don't represent the Scala community by any stretch of the imagination, but I think a lot of Scala devs will look at the patch and think, "I'd never use that." And if I won't use it, you can bet I won't be maintaining it in the future.
                
> Scala API for Avro
> ------------------
>
>                 Key: AVRO-1105
>                 URL: https://issues.apache.org/jira/browse/AVRO-1105
>             Project: Avro
>          Issue Type: New Feature
>            Reporter: Christophe Taton
>         Attachments: avro-scala.patch
>
>
> Umbrella issue.
> Goal is to provide Scala friendly APIs for Avro records and protocols (RPCs).
> Related project: http://code.google.com/p/avro-scala-compiler-plugin/ looks dead (no change since Sep 2010).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (AVRO-1105) Scala API for Avro

Posted by "Christophe Taton (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christophe Taton updated AVRO-1105:
-----------------------------------

    Attachment: avro-scala.patch

Here is a first shot of a compiler for avro schemas in Scala.

This is incomplete in many ways, but I hope and believe this can be used as a starting point.
                
> Scala API for Avro
> ------------------
>
>                 Key: AVRO-1105
>                 URL: https://issues.apache.org/jira/browse/AVRO-1105
>             Project: Avro
>          Issue Type: New Feature
>            Reporter: Christophe Taton
>         Attachments: avro-scala.patch
>
>
> Umbrella issue.
> Goal is to provide Scala friendly APIs for Avro records and protocols (RPCs).
> Related project: http://code.google.com/p/avro-scala-compiler-plugin/ looks dead (no change since Sep 2010).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-1105) Scala API for Avro

Posted by "Christophe Taton (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396026#comment-13396026 ] 

Christophe Taton commented on AVRO-1105:
----------------------------------------

This plugin is very appealing for users who live in a pure Scala world, but is fairly impractical in the general case:
 - the 22 fields limit (due to Scala case class limitations) is not acceptable when you already have lots of legacy records;
 - Avro IDL is a much better way to declare records and protocols if you need to generate binding in multiple languages.

                
> Scala API for Avro
> ------------------
>
>                 Key: AVRO-1105
>                 URL: https://issues.apache.org/jira/browse/AVRO-1105
>             Project: Avro
>          Issue Type: New Feature
>            Reporter: Christophe Taton
>         Attachments: avro-scala.patch
>
>
> Umbrella issue.
> Goal is to provide Scala friendly APIs for Avro records and protocols (RPCs).
> Related project: http://code.google.com/p/avro-scala-compiler-plugin/ looks dead (no change since Sep 2010).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-1105) Scala API for Avro

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396004#comment-13396004 ] 

Scott Carey commented on AVRO-1105:
-----------------------------------

>From the mailing list, from Michael Armbrust:

{quote}
We have a plugin for the scala compiler that takes case classes that extent a special marker trait (AvroRecord) and generates the code needed for Avro serialization.  It has mostly been used for research thus far, but we use it quite a bit as the serialization for our K/V store, storing experimental results, as well as our own homegrown message passing system.

Details can be found here: https://github.com/radlab/SCADS/wiki/Avro-Plugin

Let me know if you have any questions!

Michael
{quote}
                
> Scala API for Avro
> ------------------
>
>                 Key: AVRO-1105
>                 URL: https://issues.apache.org/jira/browse/AVRO-1105
>             Project: Avro
>          Issue Type: New Feature
>            Reporter: Christophe Taton
>         Attachments: avro-scala.patch
>
>
> Umbrella issue.
> Goal is to provide Scala friendly APIs for Avro records and protocols (RPCs).
> Related project: http://code.google.com/p/avro-scala-compiler-plugin/ looks dead (no change since Sep 2010).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-1105) Scala API for Avro

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425534#comment-13425534 ] 

Scott Carey commented on AVRO-1105:
-----------------------------------

Is there anyone watching that can comment on this patch?  It would be great to move this forward, are there folks with a Scala background interested in reviewing this?  I would like to review it in more depth, but I am not a Scala expert.
                
> Scala API for Avro
> ------------------
>
>                 Key: AVRO-1105
>                 URL: https://issues.apache.org/jira/browse/AVRO-1105
>             Project: Avro
>          Issue Type: New Feature
>            Reporter: Christophe Taton
>         Attachments: avro-scala.patch
>
>
> Umbrella issue.
> Goal is to provide Scala friendly APIs for Avro records and protocols (RPCs).
> Related project: http://code.google.com/p/avro-scala-compiler-plugin/ looks dead (no change since Sep 2010).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-1105) Scala API for Avro

Posted by "Quinn Slack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504215#comment-13504215 ] 

Quinn Slack commented on AVRO-1105:
-----------------------------------

I've done some more work on Christophe's patch to get Avro Scala codegen working. We're now using it in production, although there are still some edge cases where it generates Scala code that doesn't compile (unions of complex types).

https://github.com/sqs/avro/tree/sqs-scala-2.10.0-RC2/lang/scala (this sqs-scala-2.10.0-RC2 contains the 2.10 version; the "sqs" branch contains the 2.9 version)

Posting it here to solicit feedback and make sure others who are interested don't repeat effort. I'll work with Christophe to push this along and prepare a patch.
                
> Scala API for Avro
> ------------------
>
>                 Key: AVRO-1105
>                 URL: https://issues.apache.org/jira/browse/AVRO-1105
>             Project: Avro
>          Issue Type: New Feature
>            Reporter: Christophe Taton
>         Attachments: avro-scala.patch
>
>
> Umbrella issue.
> Goal is to provide Scala friendly APIs for Avro records and protocols (RPCs).
> Related project: http://code.google.com/p/avro-scala-compiler-plugin/ looks dead (no change since Sep 2010).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira