You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Raghava Mutharaju <m....@gmail.com> on 2010/06/25 21:33:49 UTC

Re: How to search and make indexes in ColumnFamilies with unknown columns ?

I had a similar problem. I made the column labels same as values i.e. the
data in both of them is the same.
For eg: perhaps you can use Course_Math_1 for both.

I did this because, in some cases, it was easier to use column filters and
in some cases, I could use ValueFilter and there were cases when I used both
at the same time.

Even though you wouldn't know the count of the columns, I guess you would
know the likely labels of columns in a column family? So, you can use the
column filters isn't it.

Hope this helps.

Regards,
Raghava.

On Fri, Jun 25, 2010 at 5:44 AM, SyedShoaib <sh...@hotmail.com>wrote:

>
> Thank you very much for your help. If we keep courses as columns, the
> problem
> remains the same. Actually, the number of columns are unknown. There can be
> 1000 subjects in one row. There may be only two subjects in another row.
> These subjects are unknown to us while we are programming through client
> API. The user will insert them on runtime. Now how a Filter in Client API
> will search a particular course in all columns of a ColumnFamily? All the
> filters I have explored search only in a single column of a ColumnFamily at
> one time. Thats the real problem.
>
> Many thanks for the help again.
> regards,
>
>
>
> Hegner, Travis wrote:
> >
> > I'm not an expert by any means, but I wonder if you were to store the
> > course name/type as the column name, and some arbitrary but useful value
> > as the value, for example:
> >
> > Student_Courses  // Table Name
> > {
> >      Student:   // Column Family
> >      {
> >           ID => 12345678
> >           Name => John Smith
> >      }
> >
> >      Courses:   // Column Family with any number of columns:
> >      {
> >          Maths => 2010_Fall
> >          Computer => 2011_Spring
> >          .
> >          .
> >          Science => 2011_Spring
> >      }
> > }
> >
> > The API may be better suited to handle filtering by column name, rather
> > than value, but as I said, I'm no expert, and I have very little
> > experience filtering via the API.
> >
> > Assuming the filter works correctly, you could simply ignore the value
> > retrieved if it wasn't needed. Be careful about putting too large of a
> > value in though, as that could affect performance. This is one of the
> > beauties of a column oriented schema, you can store useful, valuable
> > information as a column name.
> >
> > I do know that with this type of schema, the columns would be accessed
> > like:
> >
> > get(<row_id>, "Courses:Maths"[, <version>]);
> >
> > or something to that effect anyway...
> >
> > Hope This Helps, Good Luck!
> >
> > Travis Hegner
> > http://www.travishegner.com/
> >
> > -----Original Message-----
> > From: SyedShoaib [mailto:shoaib_talib@hotmail.com]
> > Sent: Thursday, June 24, 2010 8:26 AM
> > To: hbase-user@hadoop.apache.org
> > Subject: How to search and make indexes in ColumnFamilies with unknown
> > columns ?
> >
> >
> > Hi,
> >
> > I am new to HBase and have just worked on it for few days. I have two
> > questions. Any kind of help is fully appreciated and many thanks in
> > advance.
> >
> > 1) Suppose I have a columnFamily with unknown number of columns. I want
> to
> > search a value in this columnFamily. That value can be present in any
> > column
> > of this columnFamily. How will I search a value in whole columnFamily?
> For
> > further elaboration please consider a simple scenario:
> >
> > For example: A student can have any number of courses. Schema in HBase
> > could
> > be:
> >
> > Student_Courses  // Table Name
> > {
> >      Student:   // Column Family
> >      {
> >           ID:
> >           Name:
> >      }
> >
> >      Courses:   // Column Family with any number of columns:
> >      {
> >          Course_1:  Maths
> >          Course_2:  Computer
> >          .
> >          .
> >          Course_n:  Science
> >      }
> > }
> >
> > If I want to search all rows with a value “Maths” in any of the column
> > inside columnFamily “Course:” what will I do ? I can search for any value
> > through SingleColumnValueFilter  by mentioning ColumnFamily and Prefix
> > e.g.
> > "Student:Name". But how will I search a value in "Course:" columnFamily
> > keeping the fact in mind that I dont know how many columns I have in it.
> >
> >
> > 2) How will I make an index on this columnFamily (“Course:”) ? I know
> > indexes are made on columns but the columns are unknown in number!  I can
> > make an index on "Student:Name". But what to do if I want to make a
> single
> > index on complete “Courses:” ColumnFamily? Is it possible? It will help
> me
> > a
> > lot during a search like SHOW ME ALL THE STUDENTS REGISTERED IN MATHS.
> >
> > Regards,
> >
> > --
> > View this message in context:
> >
> http://old.nabble.com/How-to-search-and-make-indexes-in-ColumnFamilies-with-unknown-columns---tp28981932p28981932.html
> > Sent from the HBase User mailing list archive at Nabble.com.
> >
> >
> > The information contained in this communication is confidential and is
> > intended only for the use of the named recipient.  Unauthorized use,
> > disclosure, or copying is strictly prohibited and may be unlawful.  If
> you
> > have received this communication in error, you should know that you are
> > bound to confidentiality, and should please immediately notify the sender
> > or our IT Department at  866.459.4599.
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/How-to-search-and-make-indexes-in-ColumnFamilies-with-unknown-columns---tp28981932p28990537.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>