You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gora.apache.org by "Yves Langisch (JIRA)" <ji...@apache.org> on 2011/04/06 18:57:05 UTC

[jira] [Created] (GORA-33) Read map entries are all DIRTY by default

Read map entries are all DIRTY by default
-----------------------------------------

                 Key: GORA-33
                 URL: https://issues.apache.org/jira/browse/GORA-33
             Project: Gora
          Issue Type: Bug
          Components: storage-hbase
    Affects Versions: 0.1-incubating
            Reporter: Yves Langisch


I'm using the following schema with the hbase module:

{
   "type": "record",
   "name": "Request",
   "namespace": "ch.test.generated",
   "fields" : [
       {
           "name": "data",
           "type": {
               "type": "map",
               "values": "long"
           }
       }
   ]
}

In my map I may have hundreds of entries. Persisting them is fine. Getting the row, adding a new map entry and persisting it again seems to be a problem though. I can see the additional column in the hbase shell but it looks like all columns from the map are written to the row again which is a huge overhead every time I add a new entry. After looking through the code I saw that all map entries have the DIRTY state just after getting the row. Setting the state of all entries to CLEAN just after reading the row and before adding the new entry resolved the issue. The default state for all read map entries should be CLEAN.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (GORA-33) Read map entries are all DIRTY by default

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GORA-33?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris A. Mattmann updated GORA-33:
----------------------------------

    Fix Version/s: 0.2-incubating

> Read map entries are all DIRTY by default
> -----------------------------------------
>
>                 Key: GORA-33
>                 URL: https://issues.apache.org/jira/browse/GORA-33
>             Project: Gora
>          Issue Type: Bug
>          Components: storage-hbase
>    Affects Versions: 0.1-incubating
>            Reporter: Yves Langisch
>             Fix For: 0.2-incubating
>
>
> I'm using the following schema with the hbase module:
> {
>    "type": "record",
>    "name": "Request",
>    "namespace": "ch.test.generated",
>    "fields" : [
>        {
>            "name": "data",
>            "type": {
>                "type": "map",
>                "values": "long"
>            }
>        }
>    ]
> }
> In my map I may have hundreds of entries. Persisting them is fine. Getting the row, adding a new map entry and persisting it again seems to be a problem though. I can see the additional column in the hbase shell but it looks like all columns from the map are written to the row again which is a huge overhead every time I add a new entry. After looking through the code I saw that all map entries have the DIRTY state just after getting the row. Setting the state of all entries to CLEAN just after reading the row and before adding the new entry resolved the issue. The default state for all read map entries should be CLEAN.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (GORA-33) Read map entries are all DIRTY by default

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GORA-33?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris A. Mattmann updated GORA-33:
----------------------------------

    Fix Version/s:     (was: 0.2-incubating)
                   0.3-incubating

- push to 0.3

> Read map entries are all DIRTY by default
> -----------------------------------------
>
>                 Key: GORA-33
>                 URL: https://issues.apache.org/jira/browse/GORA-33
>             Project: Gora
>          Issue Type: Bug
>          Components: storage-hbase
>    Affects Versions: 0.1-incubating
>            Reporter: Yves Langisch
>             Fix For: 0.3-incubating
>
>
> I'm using the following schema with the hbase module:
> {
>    "type": "record",
>    "name": "Request",
>    "namespace": "ch.test.generated",
>    "fields" : [
>        {
>            "name": "data",
>            "type": {
>                "type": "map",
>                "values": "long"
>            }
>        }
>    ]
> }
> In my map I may have hundreds of entries. Persisting them is fine. Getting the row, adding a new map entry and persisting it again seems to be a problem though. I can see the additional column in the hbase shell but it looks like all columns from the map are written to the row again which is a huge overhead every time I add a new entry. After looking through the code I saw that all map entries have the DIRTY state just after getting the row. Setting the state of all entries to CLEAN just after reading the row and before adding the new entry resolved the issue. The default state for all read map entries should be CLEAN.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GORA-33) Read map entries are all DIRTY by default

Posted by "Lewis John McGibbney (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GORA-33?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165628#comment-13165628 ] 

Lewis John McGibbney commented on GORA-33:
------------------------------------------

Does anyone have preferences on this? We need to weigh up the pro's and cons of setting the state to CLEAN. I'm willing to look further in to this is there are no immediate opinions? In the meantime, is it worth thinking about other dirty loadings? GoraCompiler is a good example of where the above issue is evident.
                
> Read map entries are all DIRTY by default
> -----------------------------------------
>
>                 Key: GORA-33
>                 URL: https://issues.apache.org/jira/browse/GORA-33
>             Project: Gora
>          Issue Type: Bug
>          Components: storage-hbase
>    Affects Versions: 0.1-incubating
>            Reporter: Yves Langisch
>             Fix For: 0.3-incubating
>
>
> I'm using the following schema with the hbase module:
> {
>    "type": "record",
>    "name": "Request",
>    "namespace": "ch.test.generated",
>    "fields" : [
>        {
>            "name": "data",
>            "type": {
>                "type": "map",
>                "values": "long"
>            }
>        }
>    ]
> }
> In my map I may have hundreds of entries. Persisting them is fine. Getting the row, adding a new map entry and persisting it again seems to be a problem though. I can see the additional column in the hbase shell but it looks like all columns from the map are written to the row again which is a huge overhead every time I add a new entry. After looking through the code I saw that all map entries have the DIRTY state just after getting the row. Setting the state of all entries to CLEAN just after reading the row and before adding the new entry resolved the issue. The default state for all read map entries should be CLEAN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira