You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by SyedShoaib <sh...@hotmail.com> on 2010/06/24 14:25:46 UTC

How to search and make indexes in ColumnFamilies with unknown columns ?

Hi,

I am new to HBase and have just worked on it for few days. I have two
questions. Any kind of help is fully appreciated and many thanks in advance.

1) Suppose I have a columnFamily with unknown number of columns. I want to
search a value in this columnFamily. That value can be present in any column
of this columnFamily. How will I search a value in whole columnFamily? For
further elaboration please consider a simple scenario: 

For example: A student can have any number of courses. Schema in HBase could
be:

Student_Courses  // Table Name
{
     Student:   // Column Family
     {
          ID:
          Name:
     }

     Courses:   // Column Family with any number of columns:
     {
         Course_1:  Maths
         Course_2:  Computer
         .
         .
         Course_n:  Science
     }
}

If I want to search all rows with a value “Maths” in any of the column
inside columnFamily “Course:” what will I do ? I can search for any value
through SingleColumnValueFilter  by mentioning ColumnFamily and Prefix e.g.
"Student:Name". But how will I search a value in "Course:" columnFamily
keeping the fact in mind that I dont know how many columns I have in it.


2) How will I make an index on this columnFamily (“Course:”) ? I know
indexes are made on columns but the columns are unknown in number!  I can
make an index on "Student:Name". But what to do if I want to make a single
index on complete “Courses:” ColumnFamily? Is it possible? It will help me a
lot during a search like SHOW ME ALL THE STUDENTS REGISTERED IN MATHS.

Regards,

-- 
View this message in context: http://old.nabble.com/How-to-search-and-make-indexes-in-ColumnFamilies-with-unknown-columns---tp28981932p28981932.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: How to search and make indexes in ColumnFamilies with unknown columns ?

Posted by Raghava Mutharaju <m....@gmail.com>.
I had a similar problem. I made the column labels same as values i.e. the
data in both of them is the same.
For eg: perhaps you can use Course_Math_1 for both.

I did this because, in some cases, it was easier to use column filters and
in some cases, I could use ValueFilter and there were cases when I used both
at the same time.

Even though you wouldn't know the count of the columns, I guess you would
know the likely labels of columns in a column family? So, you can use the
column filters isn't it.

Hope this helps.

Regards,
Raghava.

On Fri, Jun 25, 2010 at 5:44 AM, SyedShoaib <sh...@hotmail.com>wrote:

>
> Thank you very much for your help. If we keep courses as columns, the
> problem
> remains the same. Actually, the number of columns are unknown. There can be
> 1000 subjects in one row. There may be only two subjects in another row.
> These subjects are unknown to us while we are programming through client
> API. The user will insert them on runtime. Now how a Filter in Client API
> will search a particular course in all columns of a ColumnFamily? All the
> filters I have explored search only in a single column of a ColumnFamily at
> one time. Thats the real problem.
>
> Many thanks for the help again.
> regards,
>
>
>
> Hegner, Travis wrote:
> >
> > I'm not an expert by any means, but I wonder if you were to store the
> > course name/type as the column name, and some arbitrary but useful value
> > as the value, for example:
> >
> > Student_Courses  // Table Name
> > {
> >      Student:   // Column Family
> >      {
> >           ID => 12345678
> >           Name => John Smith
> >      }
> >
> >      Courses:   // Column Family with any number of columns:
> >      {
> >          Maths => 2010_Fall
> >          Computer => 2011_Spring
> >          .
> >          .
> >          Science => 2011_Spring
> >      }
> > }
> >
> > The API may be better suited to handle filtering by column name, rather
> > than value, but as I said, I'm no expert, and I have very little
> > experience filtering via the API.
> >
> > Assuming the filter works correctly, you could simply ignore the value
> > retrieved if it wasn't needed. Be careful about putting too large of a
> > value in though, as that could affect performance. This is one of the
> > beauties of a column oriented schema, you can store useful, valuable
> > information as a column name.
> >
> > I do know that with this type of schema, the columns would be accessed
> > like:
> >
> > get(<row_id>, "Courses:Maths"[, <version>]);
> >
> > or something to that effect anyway...
> >
> > Hope This Helps, Good Luck!
> >
> > Travis Hegner
> > http://www.travishegner.com/
> >
> > -----Original Message-----
> > From: SyedShoaib [mailto:shoaib_talib@hotmail.com]
> > Sent: Thursday, June 24, 2010 8:26 AM
> > To: hbase-user@hadoop.apache.org
> > Subject: How to search and make indexes in ColumnFamilies with unknown
> > columns ?
> >
> >
> > Hi,
> >
> > I am new to HBase and have just worked on it for few days. I have two
> > questions. Any kind of help is fully appreciated and many thanks in
> > advance.
> >
> > 1) Suppose I have a columnFamily with unknown number of columns. I want
> to
> > search a value in this columnFamily. That value can be present in any
> > column
> > of this columnFamily. How will I search a value in whole columnFamily?
> For
> > further elaboration please consider a simple scenario:
> >
> > For example: A student can have any number of courses. Schema in HBase
> > could
> > be:
> >
> > Student_Courses  // Table Name
> > {
> >      Student:   // Column Family
> >      {
> >           ID:
> >           Name:
> >      }
> >
> >      Courses:   // Column Family with any number of columns:
> >      {
> >          Course_1:  Maths
> >          Course_2:  Computer
> >          .
> >          .
> >          Course_n:  Science
> >      }
> > }
> >
> > If I want to search all rows with a value “Maths” in any of the column
> > inside columnFamily “Course:” what will I do ? I can search for any value
> > through SingleColumnValueFilter  by mentioning ColumnFamily and Prefix
> > e.g.
> > "Student:Name". But how will I search a value in "Course:" columnFamily
> > keeping the fact in mind that I dont know how many columns I have in it.
> >
> >
> > 2) How will I make an index on this columnFamily (“Course:”) ? I know
> > indexes are made on columns but the columns are unknown in number!  I can
> > make an index on "Student:Name". But what to do if I want to make a
> single
> > index on complete “Courses:” ColumnFamily? Is it possible? It will help
> me
> > a
> > lot during a search like SHOW ME ALL THE STUDENTS REGISTERED IN MATHS.
> >
> > Regards,
> >
> > --
> > View this message in context:
> >
> http://old.nabble.com/How-to-search-and-make-indexes-in-ColumnFamilies-with-unknown-columns---tp28981932p28981932.html
> > Sent from the HBase User mailing list archive at Nabble.com.
> >
> >
> > The information contained in this communication is confidential and is
> > intended only for the use of the named recipient.  Unauthorized use,
> > disclosure, or copying is strictly prohibited and may be unlawful.  If
> you
> > have received this communication in error, you should know that you are
> > bound to confidentiality, and should please immediately notify the sender
> > or our IT Department at  866.459.4599.
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/How-to-search-and-make-indexes-in-ColumnFamilies-with-unknown-columns---tp28981932p28990537.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>

RE: How to search and make indexes in ColumnFamilies with unknown columns ?

Posted by "Hegner, Travis" <TH...@trilliumit.com>.
As I said before, I'm not very familiar with the API for scans/filters/etc.

If you are not worried about realtime access to that query, then you could run a map reduce job that takes in all rows, you could validate whether "Courses:Maths" exists in each row or not. If it exists "context.write("Maths", 1);" and then reduce it overall to accumulate a total.

Even better since you'd be running a mapreduce, for each course: "context.write(course, 1);" and reduce the overall output with the course name as the key, this will give you the total number in each course as a sorted list.

If you want realtime access, then potentially explore a secondary table as an index, which is kept up to date by the inserting application, or by a scheduled map/reduce.

Thanks,

Travis Hegner
http://www.travishegner.com/

-----Original Message-----
From: SyedShoaib [mailto:shoaib_talib@hotmail.com]
Sent: Friday, June 25, 2010 5:44 AM
To: hbase-user@hadoop.apache.org
Subject: RE: How to search and make indexes in ColumnFamilies with unknown columns ?


Thank you very much for your help. If we keep courses as columns, the problem
remains the same. Actually, the number of columns are unknown. There can be
1000 subjects in one row. There may be only two subjects in another row.
These subjects are unknown to us while we are programming through client
API. The user will insert them on runtime. Now how a Filter in Client API
will search a particular course in all columns of a ColumnFamily? All the
filters I have explored search only in a single column of a ColumnFamily at
one time. Thats the real problem.

Many thanks for the help again.
regards,



Hegner, Travis wrote:
>
> I'm not an expert by any means, but I wonder if you were to store the
> course name/type as the column name, and some arbitrary but useful value
> as the value, for example:
>
> Student_Courses  // Table Name
> {
>      Student:   // Column Family
>      {
>           ID => 12345678
>           Name => John Smith
>      }
>
>      Courses:   // Column Family with any number of columns:
>      {
>          Maths => 2010_Fall
>          Computer => 2011_Spring
>          .
>          .
>          Science => 2011_Spring
>      }
> }
>
> The API may be better suited to handle filtering by column name, rather
> than value, but as I said, I'm no expert, and I have very little
> experience filtering via the API.
>
> Assuming the filter works correctly, you could simply ignore the value
> retrieved if it wasn't needed. Be careful about putting too large of a
> value in though, as that could affect performance. This is one of the
> beauties of a column oriented schema, you can store useful, valuable
> information as a column name.
>
> I do know that with this type of schema, the columns would be accessed
> like:
>
> get(<row_id>, "Courses:Maths"[, <version>]);
>
> or something to that effect anyway...
>
> Hope This Helps, Good Luck!
>
> Travis Hegner
> http://www.travishegner.com/
>
> -----Original Message-----
> From: SyedShoaib [mailto:shoaib_talib@hotmail.com]
> Sent: Thursday, June 24, 2010 8:26 AM
> To: hbase-user@hadoop.apache.org
> Subject: How to search and make indexes in ColumnFamilies with unknown
> columns ?
>
>
> Hi,
>
> I am new to HBase and have just worked on it for few days. I have two
> questions. Any kind of help is fully appreciated and many thanks in
> advance.
>
> 1) Suppose I have a columnFamily with unknown number of columns. I want to
> search a value in this columnFamily. That value can be present in any
> column
> of this columnFamily. How will I search a value in whole columnFamily? For
> further elaboration please consider a simple scenario:
>
> For example: A student can have any number of courses. Schema in HBase
> could
> be:
>
> Student_Courses  // Table Name
> {
>      Student:   // Column Family
>      {
>           ID:
>           Name:
>      }
>
>      Courses:   // Column Family with any number of columns:
>      {
>          Course_1:  Maths
>          Course_2:  Computer
>          .
>          .
>          Course_n:  Science
>      }
> }
>
> If I want to search all rows with a value “Maths” in any of the column
> inside columnFamily “Course:” what will I do ? I can search for any value
> through SingleColumnValueFilter  by mentioning ColumnFamily and Prefix
> e.g.
> "Student:Name". But how will I search a value in "Course:" columnFamily
> keeping the fact in mind that I dont know how many columns I have in it.
>
>
> 2) How will I make an index on this columnFamily (“Course:”) ? I know
> indexes are made on columns but the columns are unknown in number!  I can
> make an index on "Student:Name". But what to do if I want to make a single
> index on complete “Courses:” ColumnFamily? Is it possible? It will help me
> a
> lot during a search like SHOW ME ALL THE STUDENTS REGISTERED IN MATHS.
>
> Regards,
>
> --
> View this message in context:
> http://old.nabble.com/How-to-search-and-make-indexes-in-ColumnFamilies-with-unknown-columns---tp28981932p28981932.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>
> The information contained in this communication is confidential and is
> intended only for the use of the named recipient.  Unauthorized use,
> disclosure, or copying is strictly prohibited and may be unlawful.  If you
> have received this communication in error, you should know that you are
> bound to confidentiality, and should please immediately notify the sender
> or our IT Department at  866.459.4599.
>
>

--
View this message in context: http://old.nabble.com/How-to-search-and-make-indexes-in-ColumnFamilies-with-unknown-columns---tp28981932p28990537.html
Sent from the HBase User mailing list archive at Nabble.com.


The information contained in this communication is confidential and is intended only for the use of the named recipient.  Unauthorized use, disclosure, or copying is strictly prohibited and may be unlawful.  If you have received this communication in error, you should know that you are bound to confidentiality, and should please immediately notify the sender or our IT Department at  866.459.4599.

RE: How to search and make indexes in ColumnFamilies with unknown columns ?

Posted by SyedShoaib <sh...@hotmail.com>.
Thank you very much for your help. If we keep courses as columns, the problem
remains the same. Actually, the number of columns are unknown. There can be
1000 subjects in one row. There may be only two subjects in another row.
These subjects are unknown to us while we are programming through client
API. The user will insert them on runtime. Now how a Filter in Client API
will search a particular course in all columns of a ColumnFamily? All the
filters I have explored search only in a single column of a ColumnFamily at
one time. Thats the real problem.

Many thanks for the help again.
regards,



Hegner, Travis wrote:
> 
> I'm not an expert by any means, but I wonder if you were to store the
> course name/type as the column name, and some arbitrary but useful value
> as the value, for example:
> 
> Student_Courses  // Table Name
> {
>      Student:   // Column Family
>      {
>           ID => 12345678
>           Name => John Smith
>      }
> 
>      Courses:   // Column Family with any number of columns:
>      {
>          Maths => 2010_Fall
>          Computer => 2011_Spring
>          .
>          .
>          Science => 2011_Spring
>      }
> }
> 
> The API may be better suited to handle filtering by column name, rather
> than value, but as I said, I'm no expert, and I have very little
> experience filtering via the API.
> 
> Assuming the filter works correctly, you could simply ignore the value
> retrieved if it wasn't needed. Be careful about putting too large of a
> value in though, as that could affect performance. This is one of the
> beauties of a column oriented schema, you can store useful, valuable
> information as a column name.
> 
> I do know that with this type of schema, the columns would be accessed
> like:
> 
> get(<row_id>, "Courses:Maths"[, <version>]);
> 
> or something to that effect anyway...
> 
> Hope This Helps, Good Luck!
> 
> Travis Hegner
> http://www.travishegner.com/
> 
> -----Original Message-----
> From: SyedShoaib [mailto:shoaib_talib@hotmail.com]
> Sent: Thursday, June 24, 2010 8:26 AM
> To: hbase-user@hadoop.apache.org
> Subject: How to search and make indexes in ColumnFamilies with unknown
> columns ?
> 
> 
> Hi,
> 
> I am new to HBase and have just worked on it for few days. I have two
> questions. Any kind of help is fully appreciated and many thanks in
> advance.
> 
> 1) Suppose I have a columnFamily with unknown number of columns. I want to
> search a value in this columnFamily. That value can be present in any
> column
> of this columnFamily. How will I search a value in whole columnFamily? For
> further elaboration please consider a simple scenario:
> 
> For example: A student can have any number of courses. Schema in HBase
> could
> be:
> 
> Student_Courses  // Table Name
> {
>      Student:   // Column Family
>      {
>           ID:
>           Name:
>      }
> 
>      Courses:   // Column Family with any number of columns:
>      {
>          Course_1:  Maths
>          Course_2:  Computer
>          .
>          .
>          Course_n:  Science
>      }
> }
> 
> If I want to search all rows with a value “Maths” in any of the column
> inside columnFamily “Course:” what will I do ? I can search for any value
> through SingleColumnValueFilter  by mentioning ColumnFamily and Prefix
> e.g.
> "Student:Name". But how will I search a value in "Course:" columnFamily
> keeping the fact in mind that I dont know how many columns I have in it.
> 
> 
> 2) How will I make an index on this columnFamily (“Course:”) ? I know
> indexes are made on columns but the columns are unknown in number!  I can
> make an index on "Student:Name". But what to do if I want to make a single
> index on complete “Courses:” ColumnFamily? Is it possible? It will help me
> a
> lot during a search like SHOW ME ALL THE STUDENTS REGISTERED IN MATHS.
> 
> Regards,
> 
> --
> View this message in context:
> http://old.nabble.com/How-to-search-and-make-indexes-in-ColumnFamilies-with-unknown-columns---tp28981932p28981932.html
> Sent from the HBase User mailing list archive at Nabble.com.
> 
> 
> The information contained in this communication is confidential and is
> intended only for the use of the named recipient.  Unauthorized use,
> disclosure, or copying is strictly prohibited and may be unlawful.  If you
> have received this communication in error, you should know that you are
> bound to confidentiality, and should please immediately notify the sender
> or our IT Department at  866.459.4599.
> 
> 

-- 
View this message in context: http://old.nabble.com/How-to-search-and-make-indexes-in-ColumnFamilies-with-unknown-columns---tp28981932p28990537.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: Hbase cluster Monitoring

Posted by Jean-Daniel Cryans <jd...@apache.org>.
We use ganglia at StumbleUpon, to enable metrics see
http://hbase.apache.org/docs/r0.20.5/metrics.html

J-D

On Fri, Jun 25, 2010 at 11:52 AM, Palaniappan Thiyagarajan
<pt...@cashedge.com> wrote:
> Hi,
>
> I would like to know what kind of monitoring you are having for your production env and any monitoring script can be shared.
>
>
> Thanks
> Palani
>

Hbase cluster Monitoring

Posted by Palaniappan Thiyagarajan <pt...@cashedge.com>.
Hi,

I would like to know what kind of monitoring you are having for your production env and any monitoring script can be shared.


Thanks
Palani

RE: How to search and make indexes in ColumnFamilies with unknown columns ?

Posted by "Hegner, Travis" <TH...@trilliumit.com>.
I'm not an expert by any means, but I wonder if you were to store the course name/type as the column name, and some arbitrary but useful value as the value, for example:

Student_Courses  // Table Name
{
     Student:   // Column Family
     {
          ID => 12345678
          Name => John Smith
     }

     Courses:   // Column Family with any number of columns:
     {
         Maths => 2010_Fall
         Computer => 2011_Spring
         .
         .
         Science => 2011_Spring
     }
}

The API may be better suited to handle filtering by column name, rather than value, but as I said, I'm no expert, and I have very little experience filtering via the API.

Assuming the filter works correctly, you could simply ignore the value retrieved if it wasn't needed. Be careful about putting too large of a value in though, as that could affect performance. This is one of the beauties of a column oriented schema, you can store useful, valuable information as a column name.

I do know that with this type of schema, the columns would be accessed like:

get(<row_id>, "Courses:Maths"[, <version>]);

or something to that effect anyway...

Hope This Helps, Good Luck!

Travis Hegner
http://www.travishegner.com/

-----Original Message-----
From: SyedShoaib [mailto:shoaib_talib@hotmail.com]
Sent: Thursday, June 24, 2010 8:26 AM
To: hbase-user@hadoop.apache.org
Subject: How to search and make indexes in ColumnFamilies with unknown columns ?


Hi,

I am new to HBase and have just worked on it for few days. I have two
questions. Any kind of help is fully appreciated and many thanks in advance.

1) Suppose I have a columnFamily with unknown number of columns. I want to
search a value in this columnFamily. That value can be present in any column
of this columnFamily. How will I search a value in whole columnFamily? For
further elaboration please consider a simple scenario:

For example: A student can have any number of courses. Schema in HBase could
be:

Student_Courses  // Table Name
{
     Student:   // Column Family
     {
          ID:
          Name:
     }

     Courses:   // Column Family with any number of columns:
     {
         Course_1:  Maths
         Course_2:  Computer
         .
         .
         Course_n:  Science
     }
}

If I want to search all rows with a value “Maths” in any of the column
inside columnFamily “Course:” what will I do ? I can search for any value
through SingleColumnValueFilter  by mentioning ColumnFamily and Prefix e.g.
"Student:Name". But how will I search a value in "Course:" columnFamily
keeping the fact in mind that I dont know how many columns I have in it.


2) How will I make an index on this columnFamily (“Course:”) ? I know
indexes are made on columns but the columns are unknown in number!  I can
make an index on "Student:Name". But what to do if I want to make a single
index on complete “Courses:” ColumnFamily? Is it possible? It will help me a
lot during a search like SHOW ME ALL THE STUDENTS REGISTERED IN MATHS.

Regards,

--
View this message in context: http://old.nabble.com/How-to-search-and-make-indexes-in-ColumnFamilies-with-unknown-columns---tp28981932p28981932.html
Sent from the HBase User mailing list archive at Nabble.com.


The information contained in this communication is confidential and is intended only for the use of the named recipient.  Unauthorized use, disclosure, or copying is strictly prohibited and may be unlawful.  If you have received this communication in error, you should know that you are bound to confidentiality, and should please immediately notify the sender or our IT Department at  866.459.4599.