You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by petri koski <mo...@gmail.com> on 2012/10/31 16:19:32 UTC

Hbase 0,9X + Hadoop Columns return

I am totally stuck here.

I have table which is called url and where are p, and i and s families.

Table url have 8300 rows.

Those rows are inserted like

key:xxxxx columnfamily:p: value:<webpage content>

Now, when I do scan in Hadoop, I add right columnFamily p, and try to
process all those 8300 rows in one map phase (I use multithreaderMapper
-patch, and use synchronize inside mapper etc.) I get only 566 rows (Map
input) and NOT those 8300 rows I am expecting to process.

What could possible be wrong ? I process those inputs in my Mapper as:

public void map(ImmutableBytesWritable row, Result values,Context context)
throws IOException {

       for(KeyValue kv : values.raw()){

String i = new String(kv.getRow());
                String p = new String(kv.getValue());

do something with p ..

savecontent(save processed things to table url family ilinks)

}
}

To get and idea .. What could be the reason ? I dont use any start or stop
rows ..

Yours,

Petri

Re: Hbase 0,9X + Hadoop Columns return

Posted by Jean-Daniel Cryans <jd...@apache.org>.
issues@ is the wrong mailing list for this, I'm putting it in BCC and
replying now to user@

First guess that comes in mind is that not all your rows have data in
p:, if it's really the only family you are scanning in your MR.

Trying doing a scan in the shell in p: and see what comes out.

J-D

On Wed, Oct 31, 2012 at 8:19 AM, petri koski <mo...@gmail.com> wrote:
> I am totally stuck here.
>
> I have table which is called url and where are p, and i and s families.
>
> Table url have 8300 rows.
>
> Those rows are inserted like
>
> key:xxxxx columnfamily:p: value:<webpage content>
>
> Now, when I do scan in Hadoop, I add right columnFamily p, and try to
> process all those 8300 rows in one map phase (I use multithreaderMapper
> -patch, and use synchronize inside mapper etc.) I get only 566 rows (Map
> input) and NOT those 8300 rows I am expecting to process.
>
> What could possible be wrong ? I process those inputs in my Mapper as:
>
> public void map(ImmutableBytesWritable row, Result values,Context context)
> throws IOException {
>
>        for(KeyValue kv : values.raw()){
>
> String i = new String(kv.getRow());
>                 String p = new String(kv.getValue());
>
> do something with p ..
>
> savecontent(save processed things to table url family ilinks)
>
> }
> }
>
> To get and idea .. What could be the reason ? I dont use any start or stop
> rows ..
>
> Yours,
>
> Petri

Re: Hbase 0,9X + Hadoop Columns return

Posted by Jean-Daniel Cryans <jd...@apache.org>.
issues@ is the wrong mailing list for this, I'm putting it in BCC and
replying now to user@

First guess that comes in mind is that not all your rows have data in
p:, if it's really the only family you are scanning in your MR.

Trying doing a scan in the shell in p: and see what comes out.

J-D

On Wed, Oct 31, 2012 at 8:19 AM, petri koski <mo...@gmail.com> wrote:
> I am totally stuck here.
>
> I have table which is called url and where are p, and i and s families.
>
> Table url have 8300 rows.
>
> Those rows are inserted like
>
> key:xxxxx columnfamily:p: value:<webpage content>
>
> Now, when I do scan in Hadoop, I add right columnFamily p, and try to
> process all those 8300 rows in one map phase (I use multithreaderMapper
> -patch, and use synchronize inside mapper etc.) I get only 566 rows (Map
> input) and NOT those 8300 rows I am expecting to process.
>
> What could possible be wrong ? I process those inputs in my Mapper as:
>
> public void map(ImmutableBytesWritable row, Result values,Context context)
> throws IOException {
>
>        for(KeyValue kv : values.raw()){
>
> String i = new String(kv.getRow());
>                 String p = new String(kv.getValue());
>
> do something with p ..
>
> savecontent(save processed things to table url family ilinks)
>
> }
> }
>
> To get and idea .. What could be the reason ? I dont use any start or stop
> rows ..
>
> Yours,
>
> Petri