You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Kevin Oliver (JIRA)" <ji...@apache.org> on 2009/12/18 20:01:21 UTC

[jira] Created: (AVRO-261) Allow Schemas to be immutable

Allow Schemas to be immutable
-----------------------------

                 Key: AVRO-261
                 URL: https://issues.apache.org/jira/browse/AVRO-261
             Project: Avro
          Issue Type: New Feature
          Components: java
            Reporter: Kevin Oliver
            Priority: Minor


It would be nice if there was the ability to have an immutable Schema in java. 

Without this, it makes sharing schemas risky. Also, with this, we could (lazily) cache the hashCode which is a fairly expensive operation today, especially on something like a record.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-261) Allow Schemas to be immutable

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829153#action_12829153 ] 

Philip Zeyliger commented on AVRO-261:
--------------------------------------

How about http://google-collections.googlecode.com/svn/trunk/javadoc/com/google/common/collect/ImmutableMap.html  ?  There's an ImmutableList in there as well.  The implementations seem to be "RegularX", e.g., http://www.google.com/codesearch/p?hl=en#YXcrkXezIpQ/trunk/src/com/google/common/collect/RegularImmutableList.java&q=immutablelist.java .



> Allow Schemas to be immutable
> -----------------------------
>
>                 Key: AVRO-261
>                 URL: https://issues.apache.org/jira/browse/AVRO-261
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Kevin Oliver
>            Assignee: Thiruvalluvan M. G.
>            Priority: Minor
>         Attachments: AVRO-261.patch
>
>
> It would be nice if there was the ability to have an immutable Schema in java. 
> Without this, it makes sharing schemas risky. Also, with this, we could (lazily) cache the hashCode which is a fairly expensive operation today, especially on something like a record.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-261) Allow Schemas to be immutable

Posted by "Thiruvalluvan M. G. (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828615#action_12828615 ] 

Thiruvalluvan M. G. commented on AVRO-261:
------------------------------------------

bq. does wrapping these not affect performance? Many operations iterate over fields or lookup enum symbols. The immutable wrapper adds another method call to each such operation. 

One of the heavy users of schema is GenericDatumReader (prior to AVRO-388). On my machine the performance of AvroGenericReader went down by 8 - 15%, depending on the kind of schema resolution used. Though the impact on GenericDatumReader will come down with AVRO-388, other users such as GenericDatumWriter will get affected.

I don't see an easy way out. In order to avoid exposing the collections directly to the clients, we may add methods like iterator(), get() etc. in the Schema itself. But then we'll have the very same problem as the unmodifiable wrappers.

With this kind of performance impact I'm reluctant to push this patch.

On the issue of setProp(),  will replacing it with addProp() (meaning one is allowed to add new properties to a schema, but not remove or replace existing properties) help? This is not quite immutability. But I don't think any code will be affected because of additional properties added.

> Allow Schemas to be immutable
> -----------------------------
>
>                 Key: AVRO-261
>                 URL: https://issues.apache.org/jira/browse/AVRO-261
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Kevin Oliver
>            Assignee: Thiruvalluvan M. G.
>            Priority: Minor
>         Attachments: AVRO-261.patch
>
>
> It would be nice if there was the ability to have an immutable Schema in java. 
> Without this, it makes sharing schemas risky. Also, with this, we could (lazily) cache the hashCode which is a fairly expensive operation today, especially on something like a record.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-261) Allow Schemas to be immutable

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831058#action_12831058 ] 

Doug Cutting commented on AVRO-261:
-----------------------------------

The List-based fields interface is indeed an improvement, since iterating over maps is awkard in Java.  +1 for that.

One nit: I prefer the name LockableList to LockableArrayList.  Otherwise I'm okay with this patch.

An standalone List implementation might be more clearly immutable than one based on ArrayList.  But the Google collections jar is over 500k, which would make, e.g., the tools jar that much bigger.  Copying just the ImmutableList code into this project could be a little tricky, since its not a single, standalone file.  Philip, do you want to construct a patch that copies the minimal code from Google, building a standalone ImmutableList that's not based on ArrayList?


> Allow Schemas to be immutable
> -----------------------------
>
>                 Key: AVRO-261
>                 URL: https://issues.apache.org/jira/browse/AVRO-261
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Kevin Oliver
>            Assignee: Thiruvalluvan M. G.
>            Priority: Minor
>         Attachments: AVRO-261.patch, AVRO-261.patch, AVRO-261.patch
>
>
> It would be nice if there was the ability to have an immutable Schema in java. 
> Without this, it makes sharing schemas risky. Also, with this, we could (lazily) cache the hashCode which is a fairly expensive operation today, especially on something like a record.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-261) Allow Schemas to be immutable

Posted by "Thiruvalluvan M. G. (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thiruvalluvan M. G. updated AVRO-261:
-------------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Added documentation to about immutability. I retained the name LockableArrayList as it is indeed based on the ArrayList. In any case the name is not needed outside Schema class, so the name can be changed anytime without affecting others. (It is not private only because we have unit-tests on it).

Thanks Doug, Philip, Kevin and Scott for your contribution.

Committed revision 907913.

> Allow Schemas to be immutable
> -----------------------------
>
>                 Key: AVRO-261
>                 URL: https://issues.apache.org/jira/browse/AVRO-261
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Kevin Oliver
>            Assignee: Thiruvalluvan M. G.
>            Priority: Minor
>         Attachments: AVRO-261.patch, AVRO-261.patch, AVRO-261.patch
>
>
> It would be nice if there was the ability to have an immutable Schema in java. 
> Without this, it makes sharing schemas risky. Also, with this, we could (lazily) cache the hashCode which is a fairly expensive operation today, especially on something like a record.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-261) Allow Schemas to be immutable

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828211#action_12828211 ] 

Doug Cutting commented on AVRO-261:
-----------------------------------

Thiru: does wrapping these not affect performance?  Many operations iterate over fields or lookup enum symbols.  The immutable wrapper adds another method call to each such operation.

Philip> I also wish that we could just get rid of setFields() altogether, relying on the constructor.

The problem is that, since schemas can be circular, we need to create it first, so that we can point back to it from its fields.  So, in general, we'd need to make constructors for schemas with nested schemas (records, maps, arrays) accept names for these schemas instead of the actual schemas, and resolve these later, as a fixup pass.  So it would still be a two-step API: constructor & fixup.  The fixup could be done lazily, but that would insert runtime checks.

> Allow Schemas to be immutable
> -----------------------------
>
>                 Key: AVRO-261
>                 URL: https://issues.apache.org/jira/browse/AVRO-261
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Kevin Oliver
>            Assignee: Thiruvalluvan M. G.
>            Priority: Minor
>         Attachments: AVRO-261.patch
>
>
> It would be nice if there was the ability to have an immutable Schema in java. 
> Without this, it makes sharing schemas risky. Also, with this, we could (lazily) cache the hashCode which is a fairly expensive operation today, especially on something like a record.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-261) Allow Schemas to be immutable

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828679#action_12828679 ] 

Scott Carey commented on AVRO-261:
----------------------------------

Another read-only design pattern is a bit unusual but perhaps performs better than the wrapper:

A parent class is mutable, and its child class is immutable. package protected methods can pass around the mutable version, and cast freely between them.

Essentially, create 'ImmutableArrayList extends ArrayList' as the return type of the getters.  A user _could_ cast up to ArrayList and set it, but that would break the contract and wouldn't happen by accident.
If ImmutableArrayList isn't passed in when set, the mutable List passed in would have to be copied to an ImmutableArrayList.

If most of the methods inside these classes operates on ImmutableArrayList directly instead of the List interface, performance may also go up a little. However, use of this concrete class internally needs to be carefully coordinated with the public API.



> Allow Schemas to be immutable
> -----------------------------
>
>                 Key: AVRO-261
>                 URL: https://issues.apache.org/jira/browse/AVRO-261
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Kevin Oliver
>            Assignee: Thiruvalluvan M. G.
>            Priority: Minor
>         Attachments: AVRO-261.patch
>
>
> It would be nice if there was the ability to have an immutable Schema in java. 
> Without this, it makes sharing schemas risky. Also, with this, we could (lazily) cache the hashCode which is a fairly expensive operation today, especially on something like a record.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-261) Allow Schemas to be immutable

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828676#action_12828676 ] 

Philip Zeyliger commented on AVRO-261:
--------------------------------------

If dynamic schema generation is going to be a frequent use case, one of those builders that lets you chain things together might be nice.  Something like:

SchemaBuilder.newUnionSchema().addElement(...).setDoc("....").addElement(SchemaBuilder.intSchema()).finish();

Might be too over-wrought for what we're talking about now, though.

> Allow Schemas to be immutable
> -----------------------------
>
>                 Key: AVRO-261
>                 URL: https://issues.apache.org/jira/browse/AVRO-261
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Kevin Oliver
>            Assignee: Thiruvalluvan M. G.
>            Priority: Minor
>         Attachments: AVRO-261.patch
>
>
> It would be nice if there was the ability to have an immutable Schema in java. 
> Without this, it makes sharing schemas risky. Also, with this, we could (lazily) cache the hashCode which is a fairly expensive operation today, especially on something like a record.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-261) Allow Schemas to be immutable

Posted by "Kevin Oliver (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828335#action_12828335 ] 

Kevin Oliver commented on AVRO-261:
-----------------------------------

Yep, this looks like a good step. Thanks.

Regarding setFields(), the only real implementor of it is in RecordSchema. Wouldn't passing in the fields LinkedHashMap into the constructor work? The constructor would still do what it does today, but then, after the call to the Schema's constructor, it would just do the code that is in setFields() inside its constructor. I think this is more or less equivalent to what exists today, but would let you make RecordSchema's member variables immutable and final.

As Philip points out, due to setProp(), it still isn't immutable. To solve this, we could add a Map<String,String> argument to all the variants of Schema.createXXX(). We aren't using setProp() too much, so I'm not sure if this would work for all use cases, but would for our simple examples.

If putting this in a patch would help, I should have time this week.

> Allow Schemas to be immutable
> -----------------------------
>
>                 Key: AVRO-261
>                 URL: https://issues.apache.org/jira/browse/AVRO-261
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Kevin Oliver
>            Assignee: Thiruvalluvan M. G.
>            Priority: Minor
>         Attachments: AVRO-261.patch
>
>
> It would be nice if there was the ability to have an immutable Schema in java. 
> Without this, it makes sharing schemas risky. Also, with this, we could (lazily) cache the hashCode which is a fairly expensive operation today, especially on something like a record.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-261) Allow Schemas to be immutable

Posted by "Thiruvalluvan M. G. (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828947#action_12828947 ] 

Thiruvalluvan M. G. commented on AVRO-261:
------------------------------------------

Perhaps a better approach (inspired by Scott's) is:
   - Have a subclass of ArrayList called LockableArrayList, In that
   - Have a boolean field _locked_, which is initially set to _false_.
   - Have a void method lock(), which sets _locked_ to _true_.
   - Override mutating methods of ArrayList, which will call the ArrayList's corresponding method if _locked_ is _false_. If _locked_ is _true_, these methods would throw.

Use LockableArrayList instead of ArrayList. After fully constructing the container of the LockableArrayList, lock it.

What do we do with iterator() method? Fortunately only mutating method of Iterator, remove, calls remove on the container. So one can remove things using iterator if the container is not locked; if the container is locked, remove() on the iterator would throw.

There is no change to the public interface of Schema. There is minimal performance overhead during construction.

A similar approach however doesn't work well with LinkedHashMap. For maps we have too many ways of modifying contents:
   - Through map's put(), putAll(), remove() and clear().
   - Through clear() of entrySet(), keySet() and valueSet().
   - Through remove(), removeAll(), retainAll() of entrySet(), keySet() and valueSet().
   - Through remove() method of iterator() of entrySet(), keySet() and valueSet().

The technique above works for the first two but not with the last two. Sad!




> Allow Schemas to be immutable
> -----------------------------
>
>                 Key: AVRO-261
>                 URL: https://issues.apache.org/jira/browse/AVRO-261
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Kevin Oliver
>            Assignee: Thiruvalluvan M. G.
>            Priority: Minor
>         Attachments: AVRO-261.patch
>
>
> It would be nice if there was the ability to have an immutable Schema in java. 
> Without this, it makes sharing schemas risky. Also, with this, we could (lazily) cache the hashCode which is a fairly expensive operation today, especially on something like a record.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-261) Allow Schemas to be immutable

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828354#action_12828354 ] 

Doug Cutting commented on AVRO-261:
-----------------------------------

> Wouldn't passing in the fields LinkedHashMap into the constructor work?

That map needs to be able to contain pointers back to this instance.  E.g.:

{code}
{"type": "record", "name": "StringList", "fields": [
  {"name", "value", "type" "string"},  
  {"name", "next", "type" "StringList"}
]}
{code}

A way to fix this would be to add a RecordSchema constructor that accepts a JsonNode and a Names, that includes the logic of the 'if (type.equals("record")' clause in Schema#parse(JsonNode,Names).  Then the new instance could be added to the Names before the constructor had returned, and referenced in recursive calls to parse().  However this would then require, e.g., genavro to construct a JsonNode to create a record schema, rather than directly construct one.

As for setProp(), that could be changed to return a new schema with that property set.  This would need to walk the tree, copying while replacing pointers to the top-level schema whose properties are being modified.

> Allow Schemas to be immutable
> -----------------------------
>
>                 Key: AVRO-261
>                 URL: https://issues.apache.org/jira/browse/AVRO-261
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Kevin Oliver
>            Assignee: Thiruvalluvan M. G.
>            Priority: Minor
>         Attachments: AVRO-261.patch
>
>
> It would be nice if there was the ability to have an immutable Schema in java. 
> Without this, it makes sharing schemas risky. Also, with this, we could (lazily) cache the hashCode which is a fairly expensive operation today, especially on something like a record.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-261) Allow Schemas to be immutable

Posted by "Thiruvalluvan M. G. (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thiruvalluvan M. G. updated AVRO-261:
-------------------------------------

    Attachment: AVRO-261.patch

Here is a patch which addresses this issue. The Schema we have had was almost immutable except a few things.

   - Union and Enum schemas stored references to Lists supplied to their constructors. So if the original Lists were modified, the schema would get affected. A simple fix is to make a fresh list. But these schemas also return the List through getter methods, opening them up for updates. The solution is to store an unmodifiable wrapper around the fresh list. If we are sure that the original list is not available for modification, we can simply wrap the list itself without constructing a new one.
   - Similar things happened in RecordSchema's _fields_ and _fieldSchemas_ attributes. But these collections were already constructed afresh in setFields() method. So we just need to make these unmodifiable.
   - We need extra protection against setFields() being called more than once. I applied Kevin's idea having a boolean variable _locked_. It would have been nice if we can completely avoided setFields(). To avoid setFields() we need to set the fields in the constructor. But we cannot do that because we need to construct the RecordSchema and add to the _names_ map before we can fully construct its fields. This is necessary to handle recursive schemas.
   - The Field class had a non-final field _position_. But it is private and there is already protection that it cannot be modified more than once. So nothing needs to be done there.
   

> Allow Schemas to be immutable
> -----------------------------
>
>                 Key: AVRO-261
>                 URL: https://issues.apache.org/jira/browse/AVRO-261
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Kevin Oliver
>            Priority: Minor
>         Attachments: AVRO-261.patch
>
>
> It would be nice if there was the ability to have an immutable Schema in java. 
> Without this, it makes sharing schemas risky. Also, with this, we could (lazily) cache the hashCode which is a fairly expensive operation today, especially on something like a record.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-261) Allow Schemas to be immutable

Posted by "Kevin Oliver (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828375#action_12828375 ] 

Kevin Oliver commented on AVRO-261:
-----------------------------------

Gotcha, thanks Doug. Didn't grok your earlier comments on recursive schema references. Not sure if its worth the effort then, sounds like a bit of a hassle.


> Allow Schemas to be immutable
> -----------------------------
>
>                 Key: AVRO-261
>                 URL: https://issues.apache.org/jira/browse/AVRO-261
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Kevin Oliver
>            Assignee: Thiruvalluvan M. G.
>            Priority: Minor
>         Attachments: AVRO-261.patch
>
>
> It would be nice if there was the ability to have an immutable Schema in java. 
> Without this, it makes sharing schemas risky. Also, with this, we could (lazily) cache the hashCode which is a fairly expensive operation today, especially on something like a record.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-261) Allow Schemas to be immutable

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830470#action_12830470 ] 

Philip Zeyliger commented on AVRO-261:
--------------------------------------

A few problems with the current patch:

LockableIArrayList isn't:
{noformat}
 @Test
  public void testLockStuff() {
    LockableArrayList<Integer> x = new LockableArrayList<Integer>();
    x.add(13);
    x.lock();
    Iterator<Integer> y = x.iterator();
    y.next();
    y.remove(); // this doesn't fail
    // this isn't true
    assertEquals(1, x.size());
  }
{noformat}
Yes, Java's List interface is porous left and right.  If there are serious objections to depending on another jar (and I get that), let's just lift Google's ImmutableList implementation (it's apache-licensed).

Also, this patch breaks GenAvro.  GenAvro.java is generated by JavaCC, so if you changed that locally, the change didn't get propagated.

-- Philip

> Allow Schemas to be immutable
> -----------------------------
>
>                 Key: AVRO-261
>                 URL: https://issues.apache.org/jira/browse/AVRO-261
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Kevin Oliver
>            Assignee: Thiruvalluvan M. G.
>            Priority: Minor
>         Attachments: AVRO-261.patch, AVRO-261.patch
>
>
> It would be nice if there was the ability to have an immutable Schema in java. 
> Without this, it makes sharing schemas risky. Also, with this, we could (lazily) cache the hashCode which is a fairly expensive operation today, especially on something like a record.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-261) Allow Schemas to be immutable

Posted by "Thiruvalluvan M. G. (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thiruvalluvan M. G. updated AVRO-261:
-------------------------------------

    Assignee: Thiruvalluvan M. G.
      Status: Patch Available  (was: Open)

> Allow Schemas to be immutable
> -----------------------------
>
>                 Key: AVRO-261
>                 URL: https://issues.apache.org/jira/browse/AVRO-261
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Kevin Oliver
>            Assignee: Thiruvalluvan M. G.
>            Priority: Minor
>         Attachments: AVRO-261.patch
>
>
> It would be nice if there was the ability to have an immutable Schema in java. 
> Without this, it makes sharing schemas risky. Also, with this, we could (lazily) cache the hashCode which is a fairly expensive operation today, especially on something like a record.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-261) Allow Schemas to be immutable

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829161#action_12829161 ] 

Doug Cutting commented on AVRO-261:
-----------------------------------

Philip, Google's ImmutableMap and ImmutableList look good: they should be fast and the map ordering is retained.

> Allow Schemas to be immutable
> -----------------------------
>
>                 Key: AVRO-261
>                 URL: https://issues.apache.org/jira/browse/AVRO-261
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Kevin Oliver
>            Assignee: Thiruvalluvan M. G.
>            Priority: Minor
>         Attachments: AVRO-261.patch
>
>
> It would be nice if there was the ability to have an immutable Schema in java. 
> Without this, it makes sharing schemas risky. Also, with this, we could (lazily) cache the hashCode which is a fairly expensive operation today, especially on something like a record.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-261) Allow Schemas to be immutable

Posted by "Thiruvalluvan M. G. (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thiruvalluvan M. G. updated AVRO-261:
-------------------------------------

    Attachment: AVRO-261.patch

Good catch. I failed to override remove(int) method. The latest patch fixes that.

bq. If there are serious objections to depending on another jar (and I get that), let's just lift Google's ImmutableList implementation (it's apache-licensed).

I'm not so much against Google's jar. But I'm reluctant to add a jar just for this single purpose. [Though this patch is large, the changes due to LockableArrayList is quite small. The rest are cleaning up of the Schema's public interface. I think the new interface is better irrespective of LockableArrayList.]

bq. Also, this patch breaks GenAvro. GenAvro.java is generated by JavaCC, so if you changed that locally, the change didn't get propagated.

I did update GenAvro.jj. By any chance your build didn't Javacc?

> Allow Schemas to be immutable
> -----------------------------
>
>                 Key: AVRO-261
>                 URL: https://issues.apache.org/jira/browse/AVRO-261
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Kevin Oliver
>            Assignee: Thiruvalluvan M. G.
>            Priority: Minor
>         Attachments: AVRO-261.patch, AVRO-261.patch, AVRO-261.patch
>
>
> It would be nice if there was the ability to have an immutable Schema in java. 
> Without this, it makes sharing schemas risky. Also, with this, we could (lazily) cache the hashCode which is a fairly expensive operation today, especially on something like a record.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-261) Allow Schemas to be immutable

Posted by "Thiruvalluvan M. G. (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thiruvalluvan M. G. updated AVRO-261:
-------------------------------------

    Attachment:     (was: AVRO-261.patch)

> Allow Schemas to be immutable
> -----------------------------
>
>                 Key: AVRO-261
>                 URL: https://issues.apache.org/jira/browse/AVRO-261
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Kevin Oliver
>            Priority: Minor
>         Attachments: AVRO-261.patch
>
>
> It would be nice if there was the ability to have an immutable Schema in java. 
> Without this, it makes sharing schemas risky. Also, with this, we could (lazily) cache the hashCode which is a fairly expensive operation today, especially on something like a record.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-261) Allow Schemas to be immutable

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828220#action_12828220 ] 

Scott Carey commented on AVRO-261:
----------------------------------

bq. Thiru: does wrapping these not affect performance? Many operations iterate over fields or lookup enum symbols. The immutable wrapper adds another method call to each such operation. 

Generally, if the wrapper is very simple then the JIT is good at in-lining these making the wrapper 'free'.  So if its just wrapping the List api, delegating read methods and blocking write methods it usually will be fine other than the extra object allocation for the wrapper, which is less important.  
Inlining can get tripped up in a few cases -- Interfaces are much harder to inline, and polymorphic methods with more than two variations detected at runtime are not inlined, so don't get fancy.

Either way, this should be easy to test.


> Allow Schemas to be immutable
> -----------------------------
>
>                 Key: AVRO-261
>                 URL: https://issues.apache.org/jira/browse/AVRO-261
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Kevin Oliver
>            Assignee: Thiruvalluvan M. G.
>            Priority: Minor
>         Attachments: AVRO-261.patch
>
>
> It would be nice if there was the ability to have an immutable Schema in java. 
> Without this, it makes sharing schemas risky. Also, with this, we could (lazily) cache the hashCode which is a fairly expensive operation today, especially on something like a record.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-261) Allow Schemas to be immutable

Posted by "Thiruvalluvan M. G. (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thiruvalluvan M. G. updated AVRO-261:
-------------------------------------

    Attachment: AVRO-261.patch

bq. I applied Kevin's idea having a boolean variable locked. 

Just noticed that the _fields_ attribute was already being checked for overwriting. So retracted the boolean.

> Allow Schemas to be immutable
> -----------------------------
>
>                 Key: AVRO-261
>                 URL: https://issues.apache.org/jira/browse/AVRO-261
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Kevin Oliver
>            Priority: Minor
>         Attachments: AVRO-261.patch
>
>
> It would be nice if there was the ability to have an immutable Schema in java. 
> Without this, it makes sharing schemas risky. Also, with this, we could (lazily) cache the hashCode which is a fairly expensive operation today, especially on something like a record.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-261) Allow Schemas to be immutable

Posted by "Kevin Oliver (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792583#action_12792583 ] 

Kevin Oliver commented on AVRO-261:
-----------------------------------

I can think of 2 approaches to take here:
1) Add a boolean flag to Schema, lets call it locked, that prevents modifications. Users could call Schema.lock() which would then set the flag and prevent further mutations (on the properties, fields, etc). 
2) Convert Schema to an interface and have an implementation that never allows for modifications.

Any interest? Any preference in which approach to take?

> Allow Schemas to be immutable
> -----------------------------
>
>                 Key: AVRO-261
>                 URL: https://issues.apache.org/jira/browse/AVRO-261
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Kevin Oliver
>            Priority: Minor
>
> It would be nice if there was the ability to have an immutable Schema in java. 
> Without this, it makes sharing schemas risky. Also, with this, we could (lazily) cache the hashCode which is a fairly expensive operation today, especially on something like a record.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-261) Allow Schemas to be immutable

Posted by "Thiruvalluvan M. G. (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thiruvalluvan M. G. updated AVRO-261:
-------------------------------------

    Attachment: AVRO-261.patch

Google's immutablemap seems promising. But I think I found a solution without using immutable maps.

AVRO-388 added the _name_ field to _Schema.Field_. Schema.getFields(), which is a _LinkedHashMap<String, Field>_ is used for two purposes:
   - To iterate over the Fields. Since the _Field_ now has name, this use can be handled by a _List<Field>_. That is _getFields()_ can return _List<Field>_.
   - To get the Field for a given fieldname. This is a much less frequent use case. This can be achieved by adding a _getField(String)_ method in _Schema_. To implement it, we'll can a Map<String, Field> inside _RecordSchema_, but this map is not exposed outside _RecordSchema_.

The _getFieldSchemas()_ method is longer required as its purpose can be served by the two methods (_List<Field> getFields()_ and _Field getField(String)_ above.

As a consequence _setFields()_ can now take a _List<Field>_ instead of _LinkedHashMap<String, Field>_.

To solve the problem with setProp(), I renamed it to addProp() and changed its semantics to allow only addition of new properties (no deletion or update)

It turns out that the code in many places have become simpler with these changes.

There is no performance impact. I tested with both pre- and post-AVRO-388 GenericDatumReader.

> Allow Schemas to be immutable
> -----------------------------
>
>                 Key: AVRO-261
>                 URL: https://issues.apache.org/jira/browse/AVRO-261
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Kevin Oliver
>            Assignee: Thiruvalluvan M. G.
>            Priority: Minor
>         Attachments: AVRO-261.patch, AVRO-261.patch
>
>
> It would be nice if there was the ability to have an immutable Schema in java. 
> Without this, it makes sharing schemas risky. Also, with this, we could (lazily) cache the hashCode which is a fairly expensive operation today, especially on something like a record.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-261) Allow Schemas to be immutable

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828198#action_12828198 ] 

Philip Zeyliger commented on AVRO-261:
--------------------------------------

I like having schemas immutable.  This patch looks like a good step in that direction.  setProp() is still problematic, however.  (It's also the reason that the primitive schemas aren't singletons: I think the reflect API annotates the primitive types with stuff.)  I also wish that we could just get rid of setFields() altogether, relying on the constructor.

> Allow Schemas to be immutable
> -----------------------------
>
>                 Key: AVRO-261
>                 URL: https://issues.apache.org/jira/browse/AVRO-261
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Kevin Oliver
>            Assignee: Thiruvalluvan M. G.
>            Priority: Minor
>         Attachments: AVRO-261.patch
>
>
> It would be nice if there was the ability to have an immutable Schema in java. 
> Without this, it makes sharing schemas risky. Also, with this, we could (lazily) cache the hashCode which is a fairly expensive operation today, especially on something like a record.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.