You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gora.apache.org by "Ferdy Galema (Created) (JIRA)" <ji...@apache.org> on 2012/03/08 16:15:58 UTC

[jira] [Created] (GORA-105) DataStoreFactory does not properly support multiple stores

DataStoreFactory does not properly support multiple stores
----------------------------------------------------------

                 Key: GORA-105
                 URL: https://issues.apache.org/jira/browse/GORA-105
             Project: Apache Gora
          Issue Type: Bug
          Components: schema, storage
            Reporter: Ferdy Galema
            Priority: Blocker
             Fix For: 0.2


DataStoreFactory has a single, static properties field. This is completely unacceptable, because that way when multiple stores are instantiated in the same JVM, the last store instance will overwrite the "default.schema" property. This causes that all the previous stores will read a misconfigured default schema property. Beside this it may cause several other nasty future bugs. In my opinion this is a blocker because the methods on DataStoreFactory suggest that it can handle multiple stores, when as a matter fact it doesn't. 

I will attach and commit a patch that fixes this problem. It only modifies gora-core. All stores directly benefit from this bugfix because of DataStoreBase. This patch fixes the following property related problems.

-It introduces a static method createProps in DataStoreFactory. This is the equivalent of Configuration.create(). Everyone can create a new properties object and set everything interesting on it and pass it on to whatever stores they like, instead of ALL stores.
-It fixes the method javadoc of DataStoreBase#getSchemaName(String mappingSchemaName, Class<?> persistentClass). The previous description was simply wrong.
-It SERIALIZES the properties field of DataStoreBase instead of grabbing the static DataStoreFactory.properties field. This has the additional benefit of making sure that the store can be used correctly with runtime modified properties in a mapreduce context.
-It removes the caching functionality of DataStoreFactory. Because of the dynamic configuration in the Properties and Configuration object, it is very difficult to implement a correct key hash for the cache. At the moment it only uses the triple {datastoreClass, keyClass,valueClass} as a key hash. Multiple stores cannot be properly supported when the factory uses badly implemented hash keys. (For example, one might instantiate 2 SqlStores, both using the exact same {datastoreClass, keyClass,valueClass} triple, but pointing to different databases. When one is about the instantiate the second datastore, it will faulty return the first datastore from cache). We can always reintroduce caching functionality when we can implement a proper key.

The patch passes all tests. Will commit when there are no objections.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GORA-105) DataStoreFactory does not properly support multiple stores

Posted by "Lewis John McGibbney (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GORA-105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225373#comment-13225373 ] 

Lewis John McGibbney commented on GORA-105:
-------------------------------------------

This arguement you've created here in this issue is about flexibility and usability and is a real great catch at this late stage in the day so to speak.
I've followed your commentary as above and only have the following comments:
* Would be great to see the new static createProps() annotated with something along the lines of "Everyone can create a new properties object and set everything interesting on it and pass it on to whatever stores they like, instead of ALL stores. ", this provides a neat justification for the change.
* Can you please expand on what you mean by {bq} We can always reintroduce caching functionality when we can implement a proper key. {bq}?
Does this require us to log the issue just now and act on it at a later stage?

All in all I'm very happy with the issue and of course the proposed solution. It's a real nice catch and can only get better once it's implemented and tested more.
                
> DataStoreFactory does not properly support multiple stores
> ----------------------------------------------------------
>
>                 Key: GORA-105
>                 URL: https://issues.apache.org/jira/browse/GORA-105
>             Project: Apache Gora
>          Issue Type: Bug
>          Components: schema, storage
>            Reporter: Ferdy Galema
>            Priority: Blocker
>             Fix For: 0.2
>
>         Attachments: GORA-105.patch
>
>
> DataStoreFactory has a single, static properties field. This is completely unacceptable, because that way when multiple stores are instantiated in the same JVM, the last store instance will overwrite the "default.schema" property. This causes that all the previous stores will read a misconfigured default schema property. Beside this it may cause several other nasty future bugs. In my opinion this is a blocker because the methods on DataStoreFactory suggest that it can handle multiple stores, when as a matter fact it doesn't. 
> I will attach and commit a patch that fixes this problem. It only modifies gora-core. All stores directly benefit from this bugfix because of DataStoreBase. This patch fixes the following property related problems.
> -It introduces a static method createProps in DataStoreFactory. This is the equivalent of Configuration.create(). Everyone can create a new properties object and set everything interesting on it and pass it on to whatever stores they like, instead of ALL stores.
> -It fixes the method javadoc of DataStoreBase#getSchemaName(String mappingSchemaName, Class<?> persistentClass). The previous description was simply wrong.
> -It SERIALIZES the properties field of DataStoreBase instead of grabbing the static DataStoreFactory.properties field. This has the additional benefit of making sure that the store can be used correctly with runtime modified properties in a mapreduce context.
> -It removes the caching functionality of DataStoreFactory. Because of the dynamic configuration in the Properties and Configuration object, it is very difficult to implement a correct key hash for the cache. At the moment it only uses the triple {datastoreClass, keyClass,valueClass} as a key hash. Multiple stores cannot be properly supported when the factory uses badly implemented hash keys. (For example, one might instantiate 2 SqlStores, both using the exact same {datastoreClass, keyClass,valueClass} triple, but pointing to different databases. When one is about the instantiate the second datastore, it will faulty return the first datastore from cache). We can always reintroduce caching functionality when we can implement a proper key.
> The patch passes all tests. Will commit when there are no objections.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GORA-105) DataStoreFactory does not properly support multiple stores

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GORA-105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241317#comment-13241317 ] 

Hudson commented on GORA-105:
-----------------------------

Integrated in gora-trunk #198 (See [https://builds.apache.org/job/gora-trunk/198/])
    GORA-105 DataStoreFactory does not properly support multiple stores (Revision 1306866)

     Result = SUCCESS
ferdy : 
Files : 
* /gora/trunk/CHANGES.txt
* /gora/trunk/gora-core/src/main/java/org/apache/gora/store/DataStoreFactory.java
* /gora/trunk/gora-core/src/main/java/org/apache/gora/store/impl/DataStoreBase.java
* /gora/trunk/gora-core/src/main/java/org/apache/gora/util/WritableUtils.java
* /gora/trunk/gora-core/src/test/java/org/apache/gora/GoraTestDriver.java
* /gora/trunk/gora-core/src/test/java/org/apache/gora/avro/mapreduce/TestDataFileAvroStoreMapReduce.java
* /gora/trunk/gora-core/src/test/java/org/apache/gora/avro/store/TestAvroStore.java
* /gora/trunk/gora-core/src/test/java/org/apache/gora/store/TestDataStoreFactory.java
* /gora/trunk/gora-core/src/test/java/org/apache/gora/util/TestWritableUtils.java

                
> DataStoreFactory does not properly support multiple stores
> ----------------------------------------------------------
>
>                 Key: GORA-105
>                 URL: https://issues.apache.org/jira/browse/GORA-105
>             Project: Apache Gora
>          Issue Type: Bug
>          Components: schema, storage
>            Reporter: Ferdy Galema
>            Priority: Blocker
>             Fix For: 0.2
>
>         Attachments: GORA-105-v2.patch, GORA-105.patch
>
>
> DataStoreFactory has a single, static properties field. This is completely unacceptable, because that way when multiple stores are instantiated in the same JVM, the last store instance will overwrite the "default.schema" property. This causes that all the previous stores will read a misconfigured default schema property. Beside this it may cause several other nasty future bugs. In my opinion this is a blocker because the methods on DataStoreFactory suggest that it can handle multiple stores, when as a matter fact it doesn't. 
> I will attach and commit a patch that fixes this problem. It only modifies gora-core. All stores directly benefit from this bugfix because of DataStoreBase. This patch fixes the following property related problems.
> -It introduces a static method createProps in DataStoreFactory. This is the equivalent of Configuration.create(). Everyone can create a new properties object and set everything interesting on it and pass it on to whatever stores they like, instead of ALL stores.
> -It fixes the method javadoc of DataStoreBase#getSchemaName(String mappingSchemaName, Class<?> persistentClass). The previous description was simply wrong.
> -It SERIALIZES the properties field of DataStoreBase instead of grabbing the static DataStoreFactory.properties field. This has the additional benefit of making sure that the store can be used correctly with runtime modified properties in a mapreduce context.
> -It removes the caching functionality of DataStoreFactory. Because of the dynamic configuration in the Properties and Configuration object, it is very difficult to implement a correct key hash for the cache. At the moment it only uses the triple {datastoreClass, keyClass,valueClass} as a key hash. Multiple stores cannot be properly supported when the factory uses badly implemented hash keys. (For example, one might instantiate 2 SqlStores, both using the exact same {datastoreClass, keyClass,valueClass} triple, but pointing to different databases. When one is about the instantiate the second datastore, it will faulty return the first datastore from cache). We can always reintroduce caching functionality when we can implement a proper key.
> The patch passes all tests. Will commit when there are no objections.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (GORA-105) DataStoreFactory does not properly support multiple stores

Posted by "Ferdy Galema (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GORA-105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ferdy Galema updated GORA-105:
------------------------------

    Attachment: GORA-105.patch
    
> DataStoreFactory does not properly support multiple stores
> ----------------------------------------------------------
>
>                 Key: GORA-105
>                 URL: https://issues.apache.org/jira/browse/GORA-105
>             Project: Apache Gora
>          Issue Type: Bug
>          Components: schema, storage
>            Reporter: Ferdy Galema
>            Priority: Blocker
>             Fix For: 0.2
>
>         Attachments: GORA-105.patch
>
>
> DataStoreFactory has a single, static properties field. This is completely unacceptable, because that way when multiple stores are instantiated in the same JVM, the last store instance will overwrite the "default.schema" property. This causes that all the previous stores will read a misconfigured default schema property. Beside this it may cause several other nasty future bugs. In my opinion this is a blocker because the methods on DataStoreFactory suggest that it can handle multiple stores, when as a matter fact it doesn't. 
> I will attach and commit a patch that fixes this problem. It only modifies gora-core. All stores directly benefit from this bugfix because of DataStoreBase. This patch fixes the following property related problems.
> -It introduces a static method createProps in DataStoreFactory. This is the equivalent of Configuration.create(). Everyone can create a new properties object and set everything interesting on it and pass it on to whatever stores they like, instead of ALL stores.
> -It fixes the method javadoc of DataStoreBase#getSchemaName(String mappingSchemaName, Class<?> persistentClass). The previous description was simply wrong.
> -It SERIALIZES the properties field of DataStoreBase instead of grabbing the static DataStoreFactory.properties field. This has the additional benefit of making sure that the store can be used correctly with runtime modified properties in a mapreduce context.
> -It removes the caching functionality of DataStoreFactory. Because of the dynamic configuration in the Properties and Configuration object, it is very difficult to implement a correct key hash for the cache. At the moment it only uses the triple {datastoreClass, keyClass,valueClass} as a key hash. Multiple stores cannot be properly supported when the factory uses badly implemented hash keys. (For example, one might instantiate 2 SqlStores, both using the exact same {datastoreClass, keyClass,valueClass} triple, but pointing to different databases. When one is about the instantiate the second datastore, it will faulty return the first datastore from cache). We can always reintroduce caching functionality when we can implement a proper key.
> The patch passes all tests. Will commit when there are no objections.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (GORA-105) DataStoreFactory does not properly support multiple stores

Posted by "Ferdy Galema (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GORA-105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ferdy Galema updated GORA-105:
------------------------------

    Attachment: GORA-105-v2.patch

Done. Annotating the various creator methods was... inspiring. To reduce the levels of indirection I made sure that all methods directly call the "end creator method", that is the method that actually creates and initializes the store. That should help with debugging and maintaining the code.
                
> DataStoreFactory does not properly support multiple stores
> ----------------------------------------------------------
>
>                 Key: GORA-105
>                 URL: https://issues.apache.org/jira/browse/GORA-105
>             Project: Apache Gora
>          Issue Type: Bug
>          Components: schema, storage
>            Reporter: Ferdy Galema
>            Priority: Blocker
>             Fix For: 0.2
>
>         Attachments: GORA-105-v2.patch, GORA-105.patch
>
>
> DataStoreFactory has a single, static properties field. This is completely unacceptable, because that way when multiple stores are instantiated in the same JVM, the last store instance will overwrite the "default.schema" property. This causes that all the previous stores will read a misconfigured default schema property. Beside this it may cause several other nasty future bugs. In my opinion this is a blocker because the methods on DataStoreFactory suggest that it can handle multiple stores, when as a matter fact it doesn't. 
> I will attach and commit a patch that fixes this problem. It only modifies gora-core. All stores directly benefit from this bugfix because of DataStoreBase. This patch fixes the following property related problems.
> -It introduces a static method createProps in DataStoreFactory. This is the equivalent of Configuration.create(). Everyone can create a new properties object and set everything interesting on it and pass it on to whatever stores they like, instead of ALL stores.
> -It fixes the method javadoc of DataStoreBase#getSchemaName(String mappingSchemaName, Class<?> persistentClass). The previous description was simply wrong.
> -It SERIALIZES the properties field of DataStoreBase instead of grabbing the static DataStoreFactory.properties field. This has the additional benefit of making sure that the store can be used correctly with runtime modified properties in a mapreduce context.
> -It removes the caching functionality of DataStoreFactory. Because of the dynamic configuration in the Properties and Configuration object, it is very difficult to implement a correct key hash for the cache. At the moment it only uses the triple {datastoreClass, keyClass,valueClass} as a key hash. Multiple stores cannot be properly supported when the factory uses badly implemented hash keys. (For example, one might instantiate 2 SqlStores, both using the exact same {datastoreClass, keyClass,valueClass} triple, but pointing to different databases. When one is about the instantiate the second datastore, it will faulty return the first datastore from cache). We can always reintroduce caching functionality when we can implement a proper key.
> The patch passes all tests. Will commit when there are no objections.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Closed] (GORA-105) DataStoreFactory does not properly support multiple stores

Posted by "Ferdy Galema (Closed) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GORA-105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ferdy Galema closed GORA-105.
-----------------------------

    Resolution: Fixed

committed.
                
> DataStoreFactory does not properly support multiple stores
> ----------------------------------------------------------
>
>                 Key: GORA-105
>                 URL: https://issues.apache.org/jira/browse/GORA-105
>             Project: Apache Gora
>          Issue Type: Bug
>          Components: schema, storage
>            Reporter: Ferdy Galema
>            Priority: Blocker
>             Fix For: 0.2
>
>         Attachments: GORA-105-v2.patch, GORA-105.patch
>
>
> DataStoreFactory has a single, static properties field. This is completely unacceptable, because that way when multiple stores are instantiated in the same JVM, the last store instance will overwrite the "default.schema" property. This causes that all the previous stores will read a misconfigured default schema property. Beside this it may cause several other nasty future bugs. In my opinion this is a blocker because the methods on DataStoreFactory suggest that it can handle multiple stores, when as a matter fact it doesn't. 
> I will attach and commit a patch that fixes this problem. It only modifies gora-core. All stores directly benefit from this bugfix because of DataStoreBase. This patch fixes the following property related problems.
> -It introduces a static method createProps in DataStoreFactory. This is the equivalent of Configuration.create(). Everyone can create a new properties object and set everything interesting on it and pass it on to whatever stores they like, instead of ALL stores.
> -It fixes the method javadoc of DataStoreBase#getSchemaName(String mappingSchemaName, Class<?> persistentClass). The previous description was simply wrong.
> -It SERIALIZES the properties field of DataStoreBase instead of grabbing the static DataStoreFactory.properties field. This has the additional benefit of making sure that the store can be used correctly with runtime modified properties in a mapreduce context.
> -It removes the caching functionality of DataStoreFactory. Because of the dynamic configuration in the Properties and Configuration object, it is very difficult to implement a correct key hash for the cache. At the moment it only uses the triple {datastoreClass, keyClass,valueClass} as a key hash. Multiple stores cannot be properly supported when the factory uses badly implemented hash keys. (For example, one might instantiate 2 SqlStores, both using the exact same {datastoreClass, keyClass,valueClass} triple, but pointing to different databases. When one is about the instantiate the second datastore, it will faulty return the first datastore from cache). We can always reintroduce caching functionality when we can implement a proper key.
> The patch passes all tests. Will commit when there are no objections.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GORA-105) DataStoreFactory does not properly support multiple stores

Posted by "Ferdy Galema (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GORA-105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225456#comment-13225456 ] 

Ferdy Galema commented on GORA-105:
-----------------------------------

"the new static createProps() annotated"
That's a good idea. It's going to be an important method after all. Will include it in the next patch.

"Does this require us to log the issue just now and act on it at a later stage? "
Yes that is what I propose. I will create an issue that is for implementing caching, the right way. Because this needs to be very well thought out, I did not intend to include it with this issue. Simply disabling cache should be right for now because this will prevent unexpected behaviour. (Like the scenario I sketched above with the two different databases). I do not expect this change to have negative impact.

In any case I am more than happy to also provide annotations for the "createDataStore" methods to clarify their exact behaviour.
                
> DataStoreFactory does not properly support multiple stores
> ----------------------------------------------------------
>
>                 Key: GORA-105
>                 URL: https://issues.apache.org/jira/browse/GORA-105
>             Project: Apache Gora
>          Issue Type: Bug
>          Components: schema, storage
>            Reporter: Ferdy Galema
>            Priority: Blocker
>             Fix For: 0.2
>
>         Attachments: GORA-105.patch
>
>
> DataStoreFactory has a single, static properties field. This is completely unacceptable, because that way when multiple stores are instantiated in the same JVM, the last store instance will overwrite the "default.schema" property. This causes that all the previous stores will read a misconfigured default schema property. Beside this it may cause several other nasty future bugs. In my opinion this is a blocker because the methods on DataStoreFactory suggest that it can handle multiple stores, when as a matter fact it doesn't. 
> I will attach and commit a patch that fixes this problem. It only modifies gora-core. All stores directly benefit from this bugfix because of DataStoreBase. This patch fixes the following property related problems.
> -It introduces a static method createProps in DataStoreFactory. This is the equivalent of Configuration.create(). Everyone can create a new properties object and set everything interesting on it and pass it on to whatever stores they like, instead of ALL stores.
> -It fixes the method javadoc of DataStoreBase#getSchemaName(String mappingSchemaName, Class<?> persistentClass). The previous description was simply wrong.
> -It SERIALIZES the properties field of DataStoreBase instead of grabbing the static DataStoreFactory.properties field. This has the additional benefit of making sure that the store can be used correctly with runtime modified properties in a mapreduce context.
> -It removes the caching functionality of DataStoreFactory. Because of the dynamic configuration in the Properties and Configuration object, it is very difficult to implement a correct key hash for the cache. At the moment it only uses the triple {datastoreClass, keyClass,valueClass} as a key hash. Multiple stores cannot be properly supported when the factory uses badly implemented hash keys. (For example, one might instantiate 2 SqlStores, both using the exact same {datastoreClass, keyClass,valueClass} triple, but pointing to different databases. When one is about the instantiate the second datastore, it will faulty return the first datastore from cache). We can always reintroduce caching functionality when we can implement a proper key.
> The patch passes all tests. Will commit when there are no objections.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira