You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Lewis John Mcgibbney <le...@gmail.com> on 2012/10/01 15:27:37 UTC

Advice on correct storage configuration

Hi,

I wish to confirm whether the current mapping (storage) configuration
I have is suited to store data commonly extracted field data from Web
Pages.

My mapping can be seen here [0] which basically specifies three column
families e.g. parse (p), fetch (f) and super columns (sc) within the
webpage keyspace.

Each column family subsequently includes several fields which for
clarity include comments. Current CF configuration is as follows:

- fetch CF includes 11 columns
- parse CF including 4
- super column CF including 7

I am trying to ascertain why the 7 super column fields are currently
configured to be super columns as oppose to standard columns!
I therefore wonder if someone can please clarify if such a
configuration is suited to storing data of this nature.

Thank you in advance. if this is too vague an explanation the please
say so and I will be happy to expand on any aspect in an attempt to
fully understand the data model and the configuration.

Thank you

Lewis

[0] http://svn.apache.org/viewvc/nutch/branches/2.x/conf/gora-cassandra-mapping.xml?view=markup



-- 
Lewis

Re: Advice on correct storage configuration

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Dean,

Thanks for the feedback.

On Mon, Oct 1, 2012 at 3:12 PM, Hiller, Dean <De...@nrel.gov> wrote:
> What is really going to matter is what is the applications trying to read?
>  That is really the critical piece of context.  Without knowing what the
> application needs to read, it is very hard to design.
>

OK so as I suspected, my actual description of the data which is going
to be stored in Cassandra and of course the use cases the data will be
subject to were not described as verbosely as is required to get more
substantial feedback. The reason I didn't go into more fine grained
detail regarding typical requirements for cf's, c's and sc's is that
webpage data can change quite substantially between pages, hosts, etc.

Some context here. We recently introduced a whole series of new
serializer options in Apache Gora gora-cassandra module 0.2.1 [0]
however I seem to be having problems populating Cassandra with certain
super column fields when mapping from webpages to super columns. I'm
trying to determine if each field (for the webpage --> cassandra
mapping) is correctly configured to store and retrieve the data
efficiently.

Thanks for your comments, I'll go away and have a more thorough think
+ test various configs in an attempt to find a best option.

Thanks

Lewis

[0] http://svn.apache.org/repos/asf/gora/trunk/gora-cassandra/src/main/java/org/apache/gora/cassandra/serializers/

Re: Advice on correct storage configuration

Posted by "Hiller, Dean" <De...@nrel.gov>.
What is really going to matter is what is the applications trying to read?
 That is really the critical piece of context.  Without knowing what the
application needs to read, it is very hard to design.

One example from a previous post that was a great questions wasÅ 
1. I need to get the last 100 requests no matter which user
2. I need to get the last 100 requests of a specific user

This gives anyone on the list an idea of how the model should look AND
also is why on the web you will find many many many references that noSQL
is designed from the queries.  (that said, I have seen one project where
denormalization caused them to write 1 meg of data on one request so there
is a balanceÅ .and boy was that slow on the writes).

Later,
Dean

On 10/1/12 7:27 AM, "Lewis John Mcgibbney" <le...@gmail.com>
wrote:

>Hi,
>
>I wish to confirm whether the current mapping (storage) configuration
>I have is suited to store data commonly extracted field data from Web
>Pages.
>
>My mapping can be seen here [0] which basically specifies three column
>families e.g. parse (p), fetch (f) and super columns (sc) within the
>webpage keyspace.
>
>Each column family subsequently includes several fields which for
>clarity include comments. Current CF configuration is as follows:
>
>- fetch CF includes 11 columns
>- parse CF including 4
>- super column CF including 7
>
>I am trying to ascertain why the 7 super column fields are currently
>configured to be super columns as oppose to standard columns!
>I therefore wonder if someone can please clarify if such a
>configuration is suited to storing data of this nature.
>
>Thank you in advance. if this is too vague an explanation the please
>say so and I will be happy to expand on any aspect in an attempt to
>fully understand the data model and the configuration.
>
>Thank you
>
>Lewis
>
>[0] 
>http://svn.apache.org/viewvc/nutch/branches/2.x/conf/gora-cassandra-mappin
>g.xml?view=markup
>
>
>
>-- 
>Lewis