You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@accumulo.apache.org by Geoffry Roberts <th...@gmail.com> on 2014/04/24 21:59:47 UTC

Embedded Mutations: Is this kind of thing done?

All,

I am in the throws of converting some(else's) code from MongoDB to
Accumulo.  I am seeing a situation where one DBObject if being embedded
into another DBObject.  I see that Mutation supports a method called
getRow()  that returns a byte array.  I gather I can use this to achieve a
similar result if I were so inclined.

Am I so inclined?  i.e. Is this the way we do things in Accumulo?

DBObject, roughly speaking, is Mongo's counterpart to Mutation.

Thanks mucho

-- 
There are ways and there are ways,

Geoffry Roberts

Re: Embedded Mutations: Is this kind of thing done?

Posted by Eric Newton <er...@gmail.com>.

I don't have detailed knowledge of your key, but generally speaking:

A row can have billions of columns.  There is no assumption in accumulo
that the row will fit in memory.  Of course, a single mutation will need to
fit in memory.

A row will always be served from just a single server, so its important to
have enough rows to spread the ingest/query load out over your cluster.

-Eric



On Fri, Apr 25, 2014 at 9:11 AM, Geoffry Roberts <th...@gmail.com>wrote:

> Interesting, multiple mutations that is.  Are we talking multiples on the
> same row id?
>
> Upon reflection, I realized the embedded thing is nothing special.  I
> think I'll keep adding columns to a single mutation.  This will make for a
> wide row, but I'm not seeing that as a problem.  I am I being naive?
>
> Another question if I may.  As I walk my graph, I must keep track of the
> type of the value being persisted.  I am using the qualifier for this,
> putting in it a URI that indicates the type.  Is this a proper use for the
> qualifier?
>
> Thanks for the discussion
>
>
> On Thu, Apr 24, 2014 at 11:23 PM, William Slacum <
> wilhelm.von.cloud@accumulo.net> wrote:
>
>> Depending on your table schema, you'll probably want to translate an
>> object graph into multiple mutations.
>>
>>
>> On Thu, Apr 24, 2014 at 8:40 PM, David Medinets <david.medinets@gmail.com
>> > wrote:
>>
>>> If the sub-document changes, you'll need to search the values of every
>>> Accumulo entry?
>>>
>>>
>>> On Thu, Apr 24, 2014 at 5:31 PM, Geoffry Roberts <threadedblue@gmail.com
>>> > wrote:
>>>
>>>> The use case is, I am walking a complex object graph and persisting
>>>> what I find there.  Said object graph in my case is always EMF (eclipse
>>>> modeling framework) compliant.  An EMF graph can have in if references
>>>> to--brace yourself--a non-cross document containment reference.  When using
>>>> Mongo, these were persisted as a DBObject embedded into a containing
>>>> DBObject.  I'm trying to decide whether I want to follow suit.
>>>>
>>>> Any thoughts?
>>>>
>>>>
>>>> On Thu, Apr 24, 2014 at 4:03 PM, Sean Busbey <bu...@cloudera.com>wrote:
>>>>
>>>>> Can you describe the use case more? Do you know what the purpose for
>>>>> the embedded changes are?
>>>>>
>>>>>
>>>>> On Thu, Apr 24, 2014 at 2:59 PM, Geoffry Roberts <
>>>>> threadedblue@gmail.com> wrote:
>>>>>
>>>>>> All,
>>>>>>
>>>>>> I am in the throws of converting some(else's) code from MongoDB to
>>>>>> Accumulo.  I am seeing a situation where one DBObject if being embedded
>>>>>> into another DBObject.  I see that Mutation supports a method called
>>>>>> getRow()  that returns a byte array.  I gather I can use this to achieve a
>>>>>> similar result if I were so inclined.
>>>>>>
>>>>>> Am I so inclined?  i.e. Is this the way we do things in Accumulo?
>>>>>>
>>>>>> DBObject, roughly speaking, is Mongo's counterpart to Mutation.
>>>>>>
>>>>>> Thanks mucho
>>>>>>
>>>>>> --
>>>>>> There are ways and there are ways,
>>>>>>
>>>>>> Geoffry Roberts
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sean
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> There are ways and there are ways,
>>>>
>>>> Geoffry Roberts
>>>>
>>>
>>>
>>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>

Re: Embedded Mutations: Is this kind of thing done?

Posted by Geoffry Roberts <th...@gmail.com>.

I think you told me something.  I must watch the rowid colfam colq sequence
and be sure they are unique within the row.  Will do. I believe I do have
distinct datatypes for now (they're medical) but the future may rear it's
ugly head.


On Fri, Apr 25, 2014 at 11:02 AM, Josh Elser <jo...@gmail.com> wrote:

> I might be causing more confusion. Consider the following:
>
> {"name":"Josh", "age":85}
>
> If you stored the attribute name in the colf and the type (string or int)
> in the colq, it works fine for the above document.
>
> Now consider the following document, say where there were multiple sources
> of my age with we didn't know which was reliable
>
> {"name":"Josh", "age":[40,85]}
>
> In the aforementioned scheme, "rowid age:int -> 40" and "rowid age:int ->
> 85" would collapse on one another. These are the Map (as in your
> java.util.Map) semantics that Accumulo provides.
>
> If you have very distinct data types (which it appears you do), this might
> not be of concern to you. Just be cognizant in your translation from EMF to
> Key that you aren't creating duplicate Keys unexpectedly.
>
>
> On 4/25/14, 10:53 AM, Geoffry Roberts wrote:
>
>> Ok Josh, you have me worried.
>>
>> I am storing the object's name in the colfam: e.g. "patientId", the
>> object's data type goes in the colq: e.g "org.hl7.v3.II", then the value
>> in the colval.  I think the largest graph I'm likely to have is < 5k and
>> you say I soul have memory problems.  This is good topic.  How then can
>> I estimate?
>>
>>
>> On Fri, Apr 25, 2014 at 10:17 AM, Josh Elser <josh.elser@gmail.com
>> <ma...@gmail.com>> wrote:
>>
>>     Not necessarily. If you are storing just the type in the colq and
>>     have one value and type per document/row, you won't have a problem.
>>     If you have more than one value in a type per document/row, the last
>>     one you inserted will be what sticks (which is likely undesirable).
>>
>>     Of course, this is also assuming there isn't some other uniquely
>>     identifying attribute in the colfam.
>>
>>
>>     On 4/25/14, 9:55 AM, Geoffry Roberts wrote:
>>
>>         Thanks for the comments.
>>
>>         I'm using the qualifier to tell me the type of the value.
>>           Sounds like
>>         I'm misusing it.
>>
>>         My EMF documents are running  no more than 5k so I gather a row
>>         will fit
>>         into memory well enough.
>>
>>
>>         On Fri, Apr 25, 2014 at 9:29 AM, Mike Drob <madrob@cloudera.com
>>         <ma...@cloudera.com>
>>         <mailto:madrob@cloudera.com <ma...@cloudera.com>>> wrote:
>>
>>              Large rows are only an issue if you are going to try to put
>> the
>>              entire row in memory at once. As long as you have small
>> enough
>>              entries in the row, and can treat them individually, you
>>         should be fine.
>>
>>              The qualifier is anything that you want to use to determine
>>              uniqueness across keys. So yes, this sounds fine, although
>>         possibly
>>              not fine grain enough.
>>
>>              Mike
>>
>>
>>              On Fri, Apr 25, 2014 at 9:11 AM, Geoffry Roberts
>>              <threadedblue@gmail.com <ma...@gmail.com>
>>         <mailto:threadedblue@gmail.com
>>
>>         <ma...@gmail.com>__>> wrote:
>>
>>                  Interesting, multiple mutations that is.  Are we talking
>>                  multiples on the same row id?
>>
>>                  Upon reflection, I realized the embedded thing is nothing
>>                  special.  I think I'll keep adding columns to a single
>>         mutation.
>>                    This will make for a wide row, but I'm not seeing
>>         that as a
>>                  problem.  I am I being naive?
>>
>>                  Another question if I may.  As I walk my graph, I must
>> keep
>>                  track of the type of the value being persisted.  I am
>>         using the
>>                  qualifier for this, putting in it a URI that indicates
>>         the type.
>>                    Is this a proper use for the qualifier?
>>
>>                  Thanks for the discussion
>>
>>
>>                  On Thu, Apr 24, 2014 at 11:23 PM, William Slacum
>>                  <wilhelm.von.cloud@accumulo.__net
>>         <ma...@accumulo.net>
>>                  <mailto:wilhelm.von.cloud@__accumulo.net
>>
>>         <ma...@accumulo.net>>> wrote:
>>
>>                      Depending on your table schema, you'll probably want
>> to
>>                      translate an object graph into multiple mutations.
>>
>>
>>                      On Thu, Apr 24, 2014 at 8:40 PM, David Medinets
>>                      <david.medinets@gmail.com
>>         <ma...@gmail.com>
>>         <mailto:david.medinets@gmail.__com
>>
>>         <ma...@gmail.com>>>
>>
>>                      wrote:
>>
>>                          If the sub-document changes, you'll need to
>>         search the
>>                          values of every Accumulo entry?
>>
>>
>>                          On Thu, Apr 24, 2014 at 5:31 PM, Geoffry Roberts
>>                          <threadedblue@gmail.com
>>         <ma...@gmail.com> <mailto:threadedblue@gmail.com
>>         <ma...@gmail.com>__>>
>>
>>
>>                          wrote:
>>
>>                              The use case is, I am walking a complex
>>         object graph
>>                              and persisting what I find there.  Said
>>         object graph
>>                              in my case is always EMF (eclipse modeling
>>                              framework) compliant.  An EMF graph can
>>         have in if
>>                              references to--brace yourself--a non-cross
>>         document
>>                              containment reference.  When using Mongo,
>>         these were
>>                              persisted as a DBObject embedded into a
>>         containing
>>                              DBObject.  I'm trying to decide whether I
>>         want to
>>                              follow suit.
>>
>>                              Any thoughts?
>>
>>
>>                              On Thu, Apr 24, 2014 at 4:03 PM, Sean Busbey
>>                              <busbey@cloudera.com
>>         <ma...@cloudera.com> <mailto:busbey@cloudera.com
>>
>>         <ma...@cloudera.com>>>
>>
>>                              wrote:
>>
>>                                  Can you describe the use case more? Do
>>         you know
>>                                  what the purpose for the embedded
>>         changes are?
>>
>>
>>                                  On Thu, Apr 24, 2014 at 2:59 PM,
>>         Geoffry Roberts
>>                                  <threadedblue@gmail.com
>>         <ma...@gmail.com>
>>                                  <mailto:threadedblue@gmail.com
>>
>>         <ma...@gmail.com>__>> wrote:
>>
>>                                      All,
>>
>>                                      I am in the throws of converting
>>                                      some(else's) code from MongoDB to
>>         Accumulo.
>>                                        I am seeing a situation where one
>>         DBObject
>>                                      if being embedded into another
>>         DBObject.  I
>>                                      see that Mutation supports a method
>>         called
>>                                      getRow()  that returns a byte array.
>>  I
>>                                      gather I can use this to achieve a
>>         similar
>>                                      result if I were so inclined.
>>
>>                                      Am I so inclined?  i.e. Is this the
>>         way we
>>                                      do things in Accumulo?
>>
>>                                      DBObject, roughly speaking, is
>> Mongo's
>>                                      counterpart to Mutation.
>>
>>                                      Thanks mucho
>>
>>                                      --
>>                                      There are ways and there are ways,
>>
>>                                      Geoffry Roberts
>>
>>
>>
>>
>>                                  --
>>                                  Sean
>>
>>
>>
>>
>>                              --
>>                              There are ways and there are ways,
>>
>>                              Geoffry Roberts
>>
>>
>>
>>
>>
>>
>>                  --
>>                  There are ways and there are ways,
>>
>>                  Geoffry Roberts
>>
>>
>>
>>
>>
>>         --
>>         There are ways and there are ways,
>>
>>         Geoffry Roberts
>>
>>
>>
>>
>> --
>> There are ways and there are ways,
>>
>> Geoffry Roberts
>>
>


-- 
There are ways and there are ways,

Geoffry Roberts

Re: Embedded Mutations: Is this kind of thing done?

Posted by Josh Elser <jo...@gmail.com>.

I might be causing more confusion. Consider the following:

{"name":"Josh", "age":85}

If you stored the attribute name in the colf and the type (string or 
int) in the colq, it works fine for the above document.

Now consider the following document, say where there were multiple 
sources of my age with we didn't know which was reliable

{"name":"Josh", "age":[40,85]}

In the aforementioned scheme, "rowid age:int -> 40" and "rowid age:int 
-> 85" would collapse on one another. These are the Map (as in your 
java.util.Map) semantics that Accumulo provides.

If you have very distinct data types (which it appears you do), this 
might not be of concern to you. Just be cognizant in your translation 
from EMF to Key that you aren't creating duplicate Keys unexpectedly.

On 4/25/14, 10:53 AM, Geoffry Roberts wrote:
> Ok Josh, you have me worried.
>
> I am storing the object's name in the colfam: e.g. "patientId", the
> object's data type goes in the colq: e.g "org.hl7.v3.II", then the value
> in the colval.  I think the largest graph I'm likely to have is < 5k and
> you say I soul have memory problems.  This is good topic.  How then can
> I estimate?
>
>
> On Fri, Apr 25, 2014 at 10:17 AM, Josh Elser <josh.elser@gmail.com
> <ma...@gmail.com>> wrote:
>
>     Not necessarily. If you are storing just the type in the colq and
>     have one value and type per document/row, you won't have a problem.
>     If you have more than one value in a type per document/row, the last
>     one you inserted will be what sticks (which is likely undesirable).
>
>     Of course, this is also assuming there isn't some other uniquely
>     identifying attribute in the colfam.
>
>
>     On 4/25/14, 9:55 AM, Geoffry Roberts wrote:
>
>         Thanks for the comments.
>
>         I'm using the qualifier to tell me the type of the value.
>           Sounds like
>         I'm misusing it.
>
>         My EMF documents are running  no more than 5k so I gather a row
>         will fit
>         into memory well enough.
>
>
>         On Fri, Apr 25, 2014 at 9:29 AM, Mike Drob <madrob@cloudera.com
>         <ma...@cloudera.com>
>         <mailto:madrob@cloudera.com <ma...@cloudera.com>>> wrote:
>
>              Large rows are only an issue if you are going to try to put the
>              entire row in memory at once. As long as you have small enough
>              entries in the row, and can treat them individually, you
>         should be fine.
>
>              The qualifier is anything that you want to use to determine
>              uniqueness across keys. So yes, this sounds fine, although
>         possibly
>              not fine grain enough.
>
>              Mike
>
>
>              On Fri, Apr 25, 2014 at 9:11 AM, Geoffry Roberts
>              <threadedblue@gmail.com <ma...@gmail.com>
>         <mailto:threadedblue@gmail.com
>         <ma...@gmail.com>__>> wrote:
>
>                  Interesting, multiple mutations that is.  Are we talking
>                  multiples on the same row id?
>
>                  Upon reflection, I realized the embedded thing is nothing
>                  special.  I think I'll keep adding columns to a single
>         mutation.
>                    This will make for a wide row, but I'm not seeing
>         that as a
>                  problem.  I am I being naive?
>
>                  Another question if I may.  As I walk my graph, I must keep
>                  track of the type of the value being persisted.  I am
>         using the
>                  qualifier for this, putting in it a URI that indicates
>         the type.
>                    Is this a proper use for the qualifier?
>
>                  Thanks for the discussion
>
>
>                  On Thu, Apr 24, 2014 at 11:23 PM, William Slacum
>                  <wilhelm.von.cloud@accumulo.__net
>         <ma...@accumulo.net>
>                  <mailto:wilhelm.von.cloud@__accumulo.net
>         <ma...@accumulo.net>>> wrote:
>
>                      Depending on your table schema, you'll probably want to
>                      translate an object graph into multiple mutations.
>
>
>                      On Thu, Apr 24, 2014 at 8:40 PM, David Medinets
>                      <david.medinets@gmail.com
>         <ma...@gmail.com>
>         <mailto:david.medinets@gmail.__com
>         <ma...@gmail.com>>>
>
>                      wrote:
>
>                          If the sub-document changes, you'll need to
>         search the
>                          values of every Accumulo entry?
>
>
>                          On Thu, Apr 24, 2014 at 5:31 PM, Geoffry Roberts
>                          <threadedblue@gmail.com
>         <ma...@gmail.com> <mailto:threadedblue@gmail.com
>         <ma...@gmail.com>__>>
>
>                          wrote:
>
>                              The use case is, I am walking a complex
>         object graph
>                              and persisting what I find there.  Said
>         object graph
>                              in my case is always EMF (eclipse modeling
>                              framework) compliant.  An EMF graph can
>         have in if
>                              references to--brace yourself--a non-cross
>         document
>                              containment reference.  When using Mongo,
>         these were
>                              persisted as a DBObject embedded into a
>         containing
>                              DBObject.  I'm trying to decide whether I
>         want to
>                              follow suit.
>
>                              Any thoughts?
>
>
>                              On Thu, Apr 24, 2014 at 4:03 PM, Sean Busbey
>                              <busbey@cloudera.com
>         <ma...@cloudera.com> <mailto:busbey@cloudera.com
>         <ma...@cloudera.com>>>
>
>                              wrote:
>
>                                  Can you describe the use case more? Do
>         you know
>                                  what the purpose for the embedded
>         changes are?
>
>
>                                  On Thu, Apr 24, 2014 at 2:59 PM,
>         Geoffry Roberts
>                                  <threadedblue@gmail.com
>         <ma...@gmail.com>
>                                  <mailto:threadedblue@gmail.com
>         <ma...@gmail.com>__>> wrote:
>
>                                      All,
>
>                                      I am in the throws of converting
>                                      some(else's) code from MongoDB to
>         Accumulo.
>                                        I am seeing a situation where one
>         DBObject
>                                      if being embedded into another
>         DBObject.  I
>                                      see that Mutation supports a method
>         called
>                                      getRow()  that returns a byte array.  I
>                                      gather I can use this to achieve a
>         similar
>                                      result if I were so inclined.
>
>                                      Am I so inclined?  i.e. Is this the
>         way we
>                                      do things in Accumulo?
>
>                                      DBObject, roughly speaking, is Mongo's
>                                      counterpart to Mutation.
>
>                                      Thanks mucho
>
>                                      --
>                                      There are ways and there are ways,
>
>                                      Geoffry Roberts
>
>
>
>
>                                  --
>                                  Sean
>
>
>
>
>                              --
>                              There are ways and there are ways,
>
>                              Geoffry Roberts
>
>
>
>
>
>
>                  --
>                  There are ways and there are ways,
>
>                  Geoffry Roberts
>
>
>
>
>
>         --
>         There are ways and there are ways,
>
>         Geoffry Roberts
>
>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts

Re: Embedded Mutations: Is this kind of thing done?

Posted by Geoffry Roberts <th...@gmail.com>.

Ok Josh, you have me worried.

I am storing the object's name in the colfam: e.g. "patientId", the
object's data type goes in the colq: e.g "org.hl7.v3.II", then the value in
the colval.  I think the largest graph I'm likely to have is < 5k and you
say I soul have memory problems.  This is good topic.  How then can I
estimate?


On Fri, Apr 25, 2014 at 10:17 AM, Josh Elser <jo...@gmail.com> wrote:

> Not necessarily. If you are storing just the type in the colq and have one
> value and type per document/row, you won't have a problem. If you have more
> than one value in a type per document/row, the last one you inserted will
> be what sticks (which is likely undesirable).
>
> Of course, this is also assuming there isn't some other uniquely
> identifying attribute in the colfam.
>
>
> On 4/25/14, 9:55 AM, Geoffry Roberts wrote:
>
>> Thanks for the comments.
>>
>> I'm using the qualifier to tell me the type of the value.  Sounds like
>> I'm misusing it.
>>
>> My EMF documents are running  no more than 5k so I gather a row will fit
>> into memory well enough.
>>
>>
>> On Fri, Apr 25, 2014 at 9:29 AM, Mike Drob <madrob@cloudera.com
>> <ma...@cloudera.com>> wrote:
>>
>>     Large rows are only an issue if you are going to try to put the
>>     entire row in memory at once. As long as you have small enough
>>     entries in the row, and can treat them individually, you should be
>> fine.
>>
>>     The qualifier is anything that you want to use to determine
>>     uniqueness across keys. So yes, this sounds fine, although possibly
>>     not fine grain enough.
>>
>>     Mike
>>
>>
>>     On Fri, Apr 25, 2014 at 9:11 AM, Geoffry Roberts
>>     <threadedblue@gmail.com <ma...@gmail.com>> wrote:
>>
>>         Interesting, multiple mutations that is.  Are we talking
>>         multiples on the same row id?
>>
>>         Upon reflection, I realized the embedded thing is nothing
>>         special.  I think I'll keep adding columns to a single mutation.
>>           This will make for a wide row, but I'm not seeing that as a
>>         problem.  I am I being naive?
>>
>>         Another question if I may.  As I walk my graph, I must keep
>>         track of the type of the value being persisted.  I am using the
>>         qualifier for this, putting in it a URI that indicates the type.
>>           Is this a proper use for the qualifier?
>>
>>         Thanks for the discussion
>>
>>
>>         On Thu, Apr 24, 2014 at 11:23 PM, William Slacum
>>         <wilhelm.von.cloud@accumulo.net
>>         <ma...@accumulo.net>> wrote:
>>
>>             Depending on your table schema, you'll probably want to
>>             translate an object graph into multiple mutations.
>>
>>
>>             On Thu, Apr 24, 2014 at 8:40 PM, David Medinets
>>             <david.medinets@gmail.com <ma...@gmail.com>>
>>
>>             wrote:
>>
>>                 If the sub-document changes, you'll need to search the
>>                 values of every Accumulo entry?
>>
>>
>>                 On Thu, Apr 24, 2014 at 5:31 PM, Geoffry Roberts
>>                 <threadedblue@gmail.com <ma...@gmail.com>>
>>
>>                 wrote:
>>
>>                     The use case is, I am walking a complex object graph
>>                     and persisting what I find there.  Said object graph
>>                     in my case is always EMF (eclipse modeling
>>                     framework) compliant.  An EMF graph can have in if
>>                     references to--brace yourself--a non-cross document
>>                     containment reference.  When using Mongo, these were
>>                     persisted as a DBObject embedded into a containing
>>                     DBObject.  I'm trying to decide whether I want to
>>                     follow suit.
>>
>>                     Any thoughts?
>>
>>
>>                     On Thu, Apr 24, 2014 at 4:03 PM, Sean Busbey
>>                     <busbey@cloudera.com <ma...@cloudera.com>>
>>
>>                     wrote:
>>
>>                         Can you describe the use case more? Do you know
>>                         what the purpose for the embedded changes are?
>>
>>
>>                         On Thu, Apr 24, 2014 at 2:59 PM, Geoffry Roberts
>>                         <threadedblue@gmail.com
>>                         <ma...@gmail.com>> wrote:
>>
>>                             All,
>>
>>                             I am in the throws of converting
>>                             some(else's) code from MongoDB to Accumulo.
>>                               I am seeing a situation where one DBObject
>>                             if being embedded into another DBObject.  I
>>                             see that Mutation supports a method called
>>                             getRow()  that returns a byte array.  I
>>                             gather I can use this to achieve a similar
>>                             result if I were so inclined.
>>
>>                             Am I so inclined?  i.e. Is this the way we
>>                             do things in Accumulo?
>>
>>                             DBObject, roughly speaking, is Mongo's
>>                             counterpart to Mutation.
>>
>>                             Thanks mucho
>>
>>                             --
>>                             There are ways and there are ways,
>>
>>                             Geoffry Roberts
>>
>>
>>
>>
>>                         --
>>                         Sean
>>
>>
>>
>>
>>                     --
>>                     There are ways and there are ways,
>>
>>                     Geoffry Roberts
>>
>>
>>
>>
>>
>>
>>         --
>>         There are ways and there are ways,
>>
>>         Geoffry Roberts
>>
>>
>>
>>
>>
>> --
>> There are ways and there are ways,
>>
>> Geoffry Roberts
>>
>


-- 
There are ways and there are ways,

Geoffry Roberts

Re: Embedded Mutations: Is this kind of thing done?

Posted by Josh Elser <jo...@gmail.com>.

Not necessarily. If you are storing just the type in the colq and have 
one value and type per document/row, you won't have a problem. If you 
have more than one value in a type per document/row, the last one you 
inserted will be what sticks (which is likely undesirable).

Of course, this is also assuming there isn't some other uniquely 
identifying attribute in the colfam.

On 4/25/14, 9:55 AM, Geoffry Roberts wrote:
> Thanks for the comments.
>
> I'm using the qualifier to tell me the type of the value.  Sounds like
> I'm misusing it.
>
> My EMF documents are running  no more than 5k so I gather a row will fit
> into memory well enough.
>
>
> On Fri, Apr 25, 2014 at 9:29 AM, Mike Drob <madrob@cloudera.com
> <ma...@cloudera.com>> wrote:
>
>     Large rows are only an issue if you are going to try to put the
>     entire row in memory at once. As long as you have small enough
>     entries in the row, and can treat them individually, you should be fine.
>
>     The qualifier is anything that you want to use to determine
>     uniqueness across keys. So yes, this sounds fine, although possibly
>     not fine grain enough.
>
>     Mike
>
>
>     On Fri, Apr 25, 2014 at 9:11 AM, Geoffry Roberts
>     <threadedblue@gmail.com <ma...@gmail.com>> wrote:
>
>         Interesting, multiple mutations that is.  Are we talking
>         multiples on the same row id?
>
>         Upon reflection, I realized the embedded thing is nothing
>         special.  I think I'll keep adding columns to a single mutation.
>           This will make for a wide row, but I'm not seeing that as a
>         problem.  I am I being naive?
>
>         Another question if I may.  As I walk my graph, I must keep
>         track of the type of the value being persisted.  I am using the
>         qualifier for this, putting in it a URI that indicates the type.
>           Is this a proper use for the qualifier?
>
>         Thanks for the discussion
>
>
>         On Thu, Apr 24, 2014 at 11:23 PM, William Slacum
>         <wilhelm.von.cloud@accumulo.net
>         <ma...@accumulo.net>> wrote:
>
>             Depending on your table schema, you'll probably want to
>             translate an object graph into multiple mutations.
>
>
>             On Thu, Apr 24, 2014 at 8:40 PM, David Medinets
>             <david.medinets@gmail.com <ma...@gmail.com>>
>             wrote:
>
>                 If the sub-document changes, you'll need to search the
>                 values of every Accumulo entry?
>
>
>                 On Thu, Apr 24, 2014 at 5:31 PM, Geoffry Roberts
>                 <threadedblue@gmail.com <ma...@gmail.com>>
>                 wrote:
>
>                     The use case is, I am walking a complex object graph
>                     and persisting what I find there.  Said object graph
>                     in my case is always EMF (eclipse modeling
>                     framework) compliant.  An EMF graph can have in if
>                     references to--brace yourself--a non-cross document
>                     containment reference.  When using Mongo, these were
>                     persisted as a DBObject embedded into a containing
>                     DBObject.  I'm trying to decide whether I want to
>                     follow suit.
>
>                     Any thoughts?
>
>
>                     On Thu, Apr 24, 2014 at 4:03 PM, Sean Busbey
>                     <busbey@cloudera.com <ma...@cloudera.com>>
>                     wrote:
>
>                         Can you describe the use case more? Do you know
>                         what the purpose for the embedded changes are?
>
>
>                         On Thu, Apr 24, 2014 at 2:59 PM, Geoffry Roberts
>                         <threadedblue@gmail.com
>                         <ma...@gmail.com>> wrote:
>
>                             All,
>
>                             I am in the throws of converting
>                             some(else's) code from MongoDB to Accumulo.
>                               I am seeing a situation where one DBObject
>                             if being embedded into another DBObject.  I
>                             see that Mutation supports a method called
>                             getRow()  that returns a byte array.  I
>                             gather I can use this to achieve a similar
>                             result if I were so inclined.
>
>                             Am I so inclined?  i.e. Is this the way we
>                             do things in Accumulo?
>
>                             DBObject, roughly speaking, is Mongo's
>                             counterpart to Mutation.
>
>                             Thanks mucho
>
>                             --
>                             There are ways and there are ways,
>
>                             Geoffry Roberts
>
>
>
>
>                         --
>                         Sean
>
>
>
>
>                     --
>                     There are ways and there are ways,
>
>                     Geoffry Roberts
>
>
>
>
>
>
>         --
>         There are ways and there are ways,
>
>         Geoffry Roberts
>
>
>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts

Re: Embedded Mutations: Is this kind of thing done?

Posted by Geoffry Roberts <th...@gmail.com>.

Thanks for the comments.

I'm using the qualifier to tell me the type of the value.  Sounds like I'm
misusing it.

My EMF documents are running  no more than 5k so I gather a row will fit
into memory well enough.


On Fri, Apr 25, 2014 at 9:29 AM, Mike Drob <ma...@cloudera.com> wrote:

> Large rows are only an issue if you are going to try to put the entire row
> in memory at once. As long as you have small enough entries in the row, and
> can treat them individually, you should be fine.
>
> The qualifier is anything that you want to use to determine uniqueness
> across keys. So yes, this sounds fine, although possibly not fine grain
> enough.
>
> Mike
>
>
> On Fri, Apr 25, 2014 at 9:11 AM, Geoffry Roberts <th...@gmail.com>wrote:
>
>> Interesting, multiple mutations that is.  Are we talking multiples on the
>> same row id?
>>
>> Upon reflection, I realized the embedded thing is nothing special.  I
>> think I'll keep adding columns to a single mutation.  This will make for a
>> wide row, but I'm not seeing that as a problem.  I am I being naive?
>>
>> Another question if I may.  As I walk my graph, I must keep track of the
>> type of the value being persisted.  I am using the qualifier for this,
>> putting in it a URI that indicates the type.  Is this a proper use for the
>> qualifier?
>>
>> Thanks for the discussion
>>
>>
>> On Thu, Apr 24, 2014 at 11:23 PM, William Slacum <
>> wilhelm.von.cloud@accumulo.net> wrote:
>>
>>> Depending on your table schema, you'll probably want to translate an
>>> object graph into multiple mutations.
>>>
>>>
>>> On Thu, Apr 24, 2014 at 8:40 PM, David Medinets <
>>> david.medinets@gmail.com> wrote:
>>>
>>>> If the sub-document changes, you'll need to search the values of every
>>>> Accumulo entry?
>>>>
>>>>
>>>> On Thu, Apr 24, 2014 at 5:31 PM, Geoffry Roberts <
>>>> threadedblue@gmail.com> wrote:
>>>>
>>>>> The use case is, I am walking a complex object graph and persisting
>>>>> what I find there.  Said object graph in my case is always EMF (eclipse
>>>>> modeling framework) compliant.  An EMF graph can have in if references
>>>>> to--brace yourself--a non-cross document containment reference.  When using
>>>>> Mongo, these were persisted as a DBObject embedded into a containing
>>>>> DBObject.  I'm trying to decide whether I want to follow suit.
>>>>>
>>>>> Any thoughts?
>>>>>
>>>>>
>>>>> On Thu, Apr 24, 2014 at 4:03 PM, Sean Busbey <bu...@cloudera.com>wrote:
>>>>>
>>>>>> Can you describe the use case more? Do you know what the purpose for
>>>>>> the embedded changes are?
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 24, 2014 at 2:59 PM, Geoffry Roberts <
>>>>>> threadedblue@gmail.com> wrote:
>>>>>>
>>>>>>> All,
>>>>>>>
>>>>>>> I am in the throws of converting some(else's) code from MongoDB to
>>>>>>> Accumulo.  I am seeing a situation where one DBObject if being embedded
>>>>>>> into another DBObject.  I see that Mutation supports a method called
>>>>>>> getRow()  that returns a byte array.  I gather I can use this to achieve a
>>>>>>> similar result if I were so inclined.
>>>>>>>
>>>>>>> Am I so inclined?  i.e. Is this the way we do things in Accumulo?
>>>>>>>
>>>>>>> DBObject, roughly speaking, is Mongo's counterpart to Mutation.
>>>>>>>
>>>>>>> Thanks mucho
>>>>>>>
>>>>>>> --
>>>>>>> There are ways and there are ways,
>>>>>>>
>>>>>>> Geoffry Roberts
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sean
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> There are ways and there are ways,
>>>>>
>>>>> Geoffry Roberts
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> There are ways and there are ways,
>>
>> Geoffry Roberts
>>
>
>


-- 
There are ways and there are ways,

Geoffry Roberts

Re: Embedded Mutations: Is this kind of thing done?

Posted by Mike Drob <ma...@cloudera.com>.

Large rows are only an issue if you are going to try to put the entire row
in memory at once. As long as you have small enough entries in the row, and
can treat them individually, you should be fine.

The qualifier is anything that you want to use to determine uniqueness
across keys. So yes, this sounds fine, although possibly not fine grain
enough.

Mike


On Fri, Apr 25, 2014 at 9:11 AM, Geoffry Roberts <th...@gmail.com>wrote:

> Interesting, multiple mutations that is.  Are we talking multiples on the
> same row id?
>
> Upon reflection, I realized the embedded thing is nothing special.  I
> think I'll keep adding columns to a single mutation.  This will make for a
> wide row, but I'm not seeing that as a problem.  I am I being naive?
>
> Another question if I may.  As I walk my graph, I must keep track of the
> type of the value being persisted.  I am using the qualifier for this,
> putting in it a URI that indicates the type.  Is this a proper use for the
> qualifier?
>
> Thanks for the discussion
>
>
> On Thu, Apr 24, 2014 at 11:23 PM, William Slacum <
> wilhelm.von.cloud@accumulo.net> wrote:
>
>> Depending on your table schema, you'll probably want to translate an
>> object graph into multiple mutations.
>>
>>
>> On Thu, Apr 24, 2014 at 8:40 PM, David Medinets <david.medinets@gmail.com
>> > wrote:
>>
>>> If the sub-document changes, you'll need to search the values of every
>>> Accumulo entry?
>>>
>>>
>>> On Thu, Apr 24, 2014 at 5:31 PM, Geoffry Roberts <threadedblue@gmail.com
>>> > wrote:
>>>
>>>> The use case is, I am walking a complex object graph and persisting
>>>> what I find there.  Said object graph in my case is always EMF (eclipse
>>>> modeling framework) compliant.  An EMF graph can have in if references
>>>> to--brace yourself--a non-cross document containment reference.  When using
>>>> Mongo, these were persisted as a DBObject embedded into a containing
>>>> DBObject.  I'm trying to decide whether I want to follow suit.
>>>>
>>>> Any thoughts?
>>>>
>>>>
>>>> On Thu, Apr 24, 2014 at 4:03 PM, Sean Busbey <bu...@cloudera.com>wrote:
>>>>
>>>>> Can you describe the use case more? Do you know what the purpose for
>>>>> the embedded changes are?
>>>>>
>>>>>
>>>>> On Thu, Apr 24, 2014 at 2:59 PM, Geoffry Roberts <
>>>>> threadedblue@gmail.com> wrote:
>>>>>
>>>>>> All,
>>>>>>
>>>>>> I am in the throws of converting some(else's) code from MongoDB to
>>>>>> Accumulo.  I am seeing a situation where one DBObject if being embedded
>>>>>> into another DBObject.  I see that Mutation supports a method called
>>>>>> getRow()  that returns a byte array.  I gather I can use this to achieve a
>>>>>> similar result if I were so inclined.
>>>>>>
>>>>>> Am I so inclined?  i.e. Is this the way we do things in Accumulo?
>>>>>>
>>>>>> DBObject, roughly speaking, is Mongo's counterpart to Mutation.
>>>>>>
>>>>>> Thanks mucho
>>>>>>
>>>>>> --
>>>>>> There are ways and there are ways,
>>>>>>
>>>>>> Geoffry Roberts
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sean
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> There are ways and there are ways,
>>>>
>>>> Geoffry Roberts
>>>>
>>>
>>>
>>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>

Re: Embedded Mutations: Is this kind of thing done?

Posted by Geoffry Roberts <th...@gmail.com>.

Interesting, multiple mutations that is.  Are we talking multiples on the
same row id?

Upon reflection, I realized the embedded thing is nothing special.  I think
I'll keep adding columns to a single mutation.  This will make for a wide
row, but I'm not seeing that as a problem.  I am I being naive?

Another question if I may.  As I walk my graph, I must keep track of the
type of the value being persisted.  I am using the qualifier for this,
putting in it a URI that indicates the type.  Is this a proper use for the
qualifier?

Thanks for the discussion


On Thu, Apr 24, 2014 at 11:23 PM, William Slacum <
wilhelm.von.cloud@accumulo.net> wrote:

> Depending on your table schema, you'll probably want to translate an
> object graph into multiple mutations.
>
>
> On Thu, Apr 24, 2014 at 8:40 PM, David Medinets <da...@gmail.com>wrote:
>
>> If the sub-document changes, you'll need to search the values of every
>> Accumulo entry?
>>
>>
>> On Thu, Apr 24, 2014 at 5:31 PM, Geoffry Roberts <th...@gmail.com>wrote:
>>
>>> The use case is, I am walking a complex object graph and persisting what
>>> I find there.  Said object graph in my case is always EMF (eclipse modeling
>>> framework) compliant.  An EMF graph can have in if references to--brace
>>> yourself--a non-cross document containment reference.  When using Mongo,
>>> these were persisted as a DBObject embedded into a containing DBObject.
>>>  I'm trying to decide whether I want to follow suit.
>>>
>>> Any thoughts?
>>>
>>>
>>> On Thu, Apr 24, 2014 at 4:03 PM, Sean Busbey <bu...@cloudera.com>wrote:
>>>
>>>> Can you describe the use case more? Do you know what the purpose for
>>>> the embedded changes are?
>>>>
>>>>
>>>> On Thu, Apr 24, 2014 at 2:59 PM, Geoffry Roberts <
>>>> threadedblue@gmail.com> wrote:
>>>>
>>>>> All,
>>>>>
>>>>> I am in the throws of converting some(else's) code from MongoDB to
>>>>> Accumulo.  I am seeing a situation where one DBObject if being embedded
>>>>> into another DBObject.  I see that Mutation supports a method called
>>>>> getRow()  that returns a byte array.  I gather I can use this to achieve a
>>>>> similar result if I were so inclined.
>>>>>
>>>>> Am I so inclined?  i.e. Is this the way we do things in Accumulo?
>>>>>
>>>>> DBObject, roughly speaking, is Mongo's counterpart to Mutation.
>>>>>
>>>>> Thanks mucho
>>>>>
>>>>> --
>>>>> There are ways and there are ways,
>>>>>
>>>>> Geoffry Roberts
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sean
>>>>
>>>
>>>
>>>
>>> --
>>> There are ways and there are ways,
>>>
>>> Geoffry Roberts
>>>
>>
>>
>


-- 
There are ways and there are ways,

Geoffry Roberts

Re: Embedded Mutations: Is this kind of thing done?

Posted by William Slacum <wi...@accumulo.net>.

Depending on your table schema, you'll probably want to translate an object
graph into multiple mutations.


On Thu, Apr 24, 2014 at 8:40 PM, David Medinets <da...@gmail.com>wrote:

> If the sub-document changes, you'll need to search the values of every
> Accumulo entry?
>
>
> On Thu, Apr 24, 2014 at 5:31 PM, Geoffry Roberts <th...@gmail.com>wrote:
>
>> The use case is, I am walking a complex object graph and persisting what
>> I find there.  Said object graph in my case is always EMF (eclipse modeling
>> framework) compliant.  An EMF graph can have in if references to--brace
>> yourself--a non-cross document containment reference.  When using Mongo,
>> these were persisted as a DBObject embedded into a containing DBObject.
>>  I'm trying to decide whether I want to follow suit.
>>
>> Any thoughts?
>>
>>
>> On Thu, Apr 24, 2014 at 4:03 PM, Sean Busbey <bu...@cloudera.com> wrote:
>>
>>> Can you describe the use case more? Do you know what the purpose for the
>>> embedded changes are?
>>>
>>>
>>> On Thu, Apr 24, 2014 at 2:59 PM, Geoffry Roberts <threadedblue@gmail.com
>>> > wrote:
>>>
>>>> All,
>>>>
>>>> I am in the throws of converting some(else's) code from MongoDB to
>>>> Accumulo.  I am seeing a situation where one DBObject if being embedded
>>>> into another DBObject.  I see that Mutation supports a method called
>>>> getRow()  that returns a byte array.  I gather I can use this to achieve a
>>>> similar result if I were so inclined.
>>>>
>>>> Am I so inclined?  i.e. Is this the way we do things in Accumulo?
>>>>
>>>> DBObject, roughly speaking, is Mongo's counterpart to Mutation.
>>>>
>>>> Thanks mucho
>>>>
>>>> --
>>>> There are ways and there are ways,
>>>>
>>>> Geoffry Roberts
>>>>
>>>
>>>
>>>
>>> --
>>> Sean
>>>
>>
>>
>>
>> --
>> There are ways and there are ways,
>>
>> Geoffry Roberts
>>
>
>

Re: Embedded Mutations: Is this kind of thing done?

Posted by David Medinets <da...@gmail.com>.

If the sub-document changes, you'll need to search the values of every
Accumulo entry?


On Thu, Apr 24, 2014 at 5:31 PM, Geoffry Roberts <th...@gmail.com>wrote:

> The use case is, I am walking a complex object graph and persisting what I
> find there.  Said object graph in my case is always EMF (eclipse modeling
> framework) compliant.  An EMF graph can have in if references to--brace
> yourself--a non-cross document containment reference.  When using Mongo,
> these were persisted as a DBObject embedded into a containing DBObject.
>  I'm trying to decide whether I want to follow suit.
>
> Any thoughts?
>
>
> On Thu, Apr 24, 2014 at 4:03 PM, Sean Busbey <bu...@cloudera.com> wrote:
>
>> Can you describe the use case more? Do you know what the purpose for the
>> embedded changes are?
>>
>>
>> On Thu, Apr 24, 2014 at 2:59 PM, Geoffry Roberts <th...@gmail.com>wrote:
>>
>>> All,
>>>
>>> I am in the throws of converting some(else's) code from MongoDB to
>>> Accumulo.  I am seeing a situation where one DBObject if being embedded
>>> into another DBObject.  I see that Mutation supports a method called
>>> getRow()  that returns a byte array.  I gather I can use this to achieve a
>>> similar result if I were so inclined.
>>>
>>> Am I so inclined?  i.e. Is this the way we do things in Accumulo?
>>>
>>> DBObject, roughly speaking, is Mongo's counterpart to Mutation.
>>>
>>> Thanks mucho
>>>
>>> --
>>> There are ways and there are ways,
>>>
>>> Geoffry Roberts
>>>
>>
>>
>>
>> --
>> Sean
>>
>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>

Re: Embedded Mutations: Is this kind of thing done?

Posted by Geoffry Roberts <th...@gmail.com>.

The use case is, I am walking a complex object graph and persisting what I
find there.  Said object graph in my case is always EMF (eclipse modeling
framework) compliant.  An EMF graph can have in if references to--brace
yourself--a non-cross document containment reference.  When using Mongo,
these were persisted as a DBObject embedded into a containing DBObject.
 I'm trying to decide whether I want to follow suit.

Any thoughts?

On Thu, Apr 24, 2014 at 4:03 PM, Sean Busbey <bu...@cloudera.com> wrote:

> Can you describe the use case more? Do you know what the purpose for the
> embedded changes are?
>
>
> On Thu, Apr 24, 2014 at 2:59 PM, Geoffry Roberts <th...@gmail.com>wrote:
>
>> All,
>>
>> I am in the throws of converting some(else's) code from MongoDB to
>> Accumulo.  I am seeing a situation where one DBObject if being embedded
>> into another DBObject.  I see that Mutation supports a method called
>> getRow()  that returns a byte array.  I gather I can use this to achieve a
>> similar result if I were so inclined.
>>
>> Am I so inclined?  i.e. Is this the way we do things in Accumulo?
>>
>> DBObject, roughly speaking, is Mongo's counterpart to Mutation.
>>
>> Thanks mucho
>>
>> --
>> There are ways and there are ways,
>>
>> Geoffry Roberts
>>
>
>
>
> --
> Sean
>

-- 
There are ways and there are ways,

Geoffry Roberts

Re: Embedded Mutations: Is this kind of thing done?

Posted by Sean Busbey <bu...@cloudera.com>.

Can you describe the use case more? Do you know what the purpose for the
embedded changes are?


On Thu, Apr 24, 2014 at 2:59 PM, Geoffry Roberts <th...@gmail.com>wrote:

> All,
>
> I am in the throws of converting some(else's) code from MongoDB to
> Accumulo.  I am seeing a situation where one DBObject if being embedded
> into another DBObject.  I see that Mutation supports a method called
> getRow()  that returns a byte array.  I gather I can use this to achieve a
> similar result if I were so inclined.
>
> Am I so inclined?  i.e. Is this the way we do things in Accumulo?
>
> DBObject, roughly speaking, is Mongo's counterpart to Mutation.
>
> Thanks mucho
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>



-- 
Sean