You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@gora.apache.org by SJC Multimedia <sj...@gmail.com> on 2017/10/30 18:08:32 UTC

Nutch + Gora + Hbase client ( BigTable )

Hi

I am trying out Google BigTable as a nutch backend for which there is no
official documentation that its supported. However I dont see any reason
why it would be not be possible so I am giving it a shot.

I have upgraded Gora to 0.8 version with Nutch 2.3.1 and JDK to 1.8.

Currently while utilizing *bigtable-hbase-1.x-hadoop-1.0.0-pre3.jar *version,
call to Bigtable fails while performing flushCommits as part of inject
operation. I do see the table getting created on the BigTable side but the
table is empty.

The exception by itself is not enough to give us an answer.  The
UnsupportedOperationException is a bit strange.  I'm not sure where that's
coming from.  Here
<https://cloud.google.com/bigtable/docs/hbase-batch-exceptions>'s a guide
on getting more information from a RetriesExhaustedWithDetailsException,
since neither Gora or BigtableBufferedMutator are under our control.

This seems like a client-side thing, so this is likely some strange
interaction between BigTable library and Gora.

*Any suggestion on how exactly to figure out what is the issue here?*


Here is grpc session info:

2017-10-27 17:37:51,462 INFO  grpc.BigtableSession - Bigtable options:
BigtableOptions{dataHost=bigtable.googleapis.com, tableAdminHost=
bigtableadmin.googleapis.com, instanceAdminHost=bigtableadmin.googleapis.com,
projectId=xxxxxx-dev, instanceId=big-table-nutch-test,
userAgent=hbase-1.2.0-cdh5.13.0, credentialType=DefaultCredentials,
port=443, dataChannelCount=20, retryOptions=RetryOptions{retriesEnabled=true,
allowRetriesWithoutTimestamp=false, statusToRetryOn=[INTERNAL,
DEADLINE_EXCEEDED, ABORTED, UNAUTHENTICATED, UNAVAILABLE],
initialBackoffMillis=5, maxElapsedBackoffMillis=60000,
backoffMultiplier=2.0, streamingBufferSize=60,
readPartialRowTimeoutMillis=60000, maxScanTimeoutRetries=3},
bulkOptions=BulkOptions{asyncMutatorCount=2, useBulkApi=true,
bulkMaxKeyCount=25, bulkMaxRequestSize=1048576, autoflushMs=0,
maxInflightRpcs=1000, maxMemory=93218406, enableBulkMutationThrottling=false,
bulkMutationRpcTargetMs=100},
callOptionsConfig=CallOptionsConfig{useTimeout=false,
shortRpcTimeoutMs=60000, longRpcTimeoutMs=600000},
usePlaintextNegotiation=false}.

Getting following error:

2017-10-27 17:37:51,660 ERROR store.HBaseStore - Failed 1 action:
UnsupportedOperationException: 1 time, servers with issues:
bigtable.googleapis.com,
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
1 action: UnsupportedOperationException: 1 time, servers with issues:
bigtable.googleapis.com,
at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.hand
leExceptions(BigtableBufferedMutator.java:271)
at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.muta
te(BigtableBufferedMutator.java:198)
at org.apache.gora.hbase.store.HBaseTableConnection.flushCommit
s(HBaseTableConnection.java:115)
at org.apache.gora.hbase.store.HBaseTableConnection.close(HBase
TableConnection.java:127)
at org.apache.gora.hbase.store.HBaseStore.close(HBaseStore.java:819)
at org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordW
riter.java:56)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.cl
ose(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.
run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
Executor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
lExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Thanks
Akshar

Re: Nutch + Gora + Hbase client ( BigTable )

Posted by Alfonso Nishikawa <al...@gmail.com>.
Hi, Akshar.

I usually use Eclipse. If you use it, it is quite simple, since the project
is mavenized.
You have to clone the repository https://github.com/apache/gora.git
After that, if you decide to stick to version 0.8, checkout that tag: git
checkout tags/apache-gora-0.8
(but you can work on master)

In Eclipse:

    File >Import... > Maven > Existing Maven Projects

And you browse the project folder and click Accept (OK or whatever, to
select that path). Then select all projects and click Finish. It will
import the main maven project and subprojects.
You can close all of them, except gora-core and gora-hbase.

Modify gora-hbase to your needs:

* I am assuming the bigtable-hbase library is a substitute for hbase-client
(as a proxy)?

  <root>/pom.xml has declared the versions of all dependencies, so you can
add there the bigtable-hbase-1.x-hadoop one (as in my first answer)
  <root>/gora-hbase/pom.xml declares what dependencies the module uses, but
without the version. Here I guess you have to substitute hbase-client with
the bigtable one.

* Modify HBaseTableConnection

After finishing your modifications, you can install gora-hbase in the local
maven repository, so Nutch will pickup your version:

1- Execute on console: mvn -DskipTests -pl gora-hbase -am install
2- Delete ivy cache (I don't remember exactly the folder to delete, I
always forget it since I only use ivy with Nutch).
3- Compile Nutch again so it picks your compiled Gora.

I am telling by memory, so if some step is wrong let me know. And any
question here we are.
About mvn execution, as you can see I am skipping  the tests, since they
bring up a HBase standalone instance and test against it, so will not work
with you.

Thanks to you for the try.

Regards,

Alfonso


2017-11-06 21:26 GMT-01:00 SJC Multimedia <sj...@gmail.com>:

> Thanks for the suggestion. Very interested in trying it out. Can you
> please suggest step need to build gora from source so that I can modify
> HBaseTableConnection?
>
> I already have dependency for bigtable and hbase-common 1.2.3 in my ivy
> file.
>
> Thanks
> Akshar
>
> On Tue, Oct 31, 2017 at 12:27 PM, Alfonso Nishikawa <
> alfonso.nishikawa@gmail.com> wrote:
>
>> Hi, Akshar.
>>
>> Much probably you are the first one in do what you are trying. I never
>> used Google Cloud Platform, but in case there is no answer to your
>> question, my only suggestion would be to clone the repository [1], try with
>> the bigtable dependency:
>>
>>       <dependency>
>>         <groupId>com.google.cloud.bigtable</groupId>
>>         <artifactId>bigtable-hbase-1.x-hadoop</artifactId>
>>         <version>1.0.0-pre3</version>
>>       </dependency>
>>
>> and add some "catch" at HBaseTableConnection class [2] to see what is
>> happening there.
>>
>> I know this is not a solution, but I am at your disposal for any question
>> about this approach (when I know the answer, of course).
>>
>> [1] https://github.com/apache/gora/tree/apache-gora-0.8
>> [2] https://github.com/apache/gora/blob/apache-gora-0.8/gora-
>> hbase/src/main/java/org/apache/gora/hbase/store/HBaseTableCo
>> nnection.java#L115
>>
>> Regars,
>>
>> Alfonso Nishikawa
>>
>>
>>
>> 2017-10-30 17:08 GMT-01:00 SJC Multimedia <sj...@gmail.com>:
>>
>>> Hi
>>>
>>> I am trying out Google BigTable as a nutch backend for which there is no
>>> official documentation that its supported. However I dont see any reason
>>> why it would be not be possible so I am giving it a shot.
>>>
>>> I have upgraded Gora to 0.8 version with Nutch 2.3.1 and JDK to 1.8.
>>>
>>> Currently while utilizing *bigtable-hbase-1.x-hadoop-1.0.0-pre3.jar *version,
>>> call to Bigtable fails while performing flushCommits as part of inject
>>> operation. I do see the table getting created on the BigTable side but the
>>> table is empty.
>>>
>>> The exception by itself is not enough to give us an answer.  The
>>> UnsupportedOperationException is a bit strange.  I'm not sure where
>>> that's coming from.  Here
>>> <https://cloud.google.com/bigtable/docs/hbase-batch-exceptions>'s a
>>> guide on getting more information from a RetriesExhaustedWithDetailsException,
>>> since neither Gora or BigtableBufferedMutator are under our control.
>>>
>>> This seems like a client-side thing, so this is likely some strange
>>> interaction between BigTable library and Gora.
>>>
>>> *Any suggestion on how exactly to figure out what is the issue here?*
>>>
>>>
>>> Here is grpc session info:
>>>
>>> 2017-10-27 17:37:51,462 INFO  grpc.BigtableSession - Bigtable options:
>>> BigtableOptions{dataHost=bigtable.googleapis.com, tableAdminHost=
>>> bigtableadmin.googleapis.com, instanceAdminHost=bigtableadmi
>>> n.googleapis.com, projectId=xxxxxx-dev, instanceId=big-table-nutch-test,
>>> userAgent=hbase-1.2.0-cdh5.13.0, credentialType=DefaultCredentials,
>>> port=443, dataChannelCount=20, retryOptions=RetryOptions{retriesEnabled=true,
>>> allowRetriesWithoutTimestamp=false, statusToRetryOn=[INTERNAL,
>>> DEADLINE_EXCEEDED, ABORTED, UNAUTHENTICATED, UNAVAILABLE],
>>> initialBackoffMillis=5, maxElapsedBackoffMillis=60000,
>>> backoffMultiplier=2.0, streamingBufferSize=60,
>>> readPartialRowTimeoutMillis=60000, maxScanTimeoutRetries=3},
>>> bulkOptions=BulkOptions{asyncMutatorCount=2, useBulkApi=true,
>>> bulkMaxKeyCount=25, bulkMaxRequestSize=1048576, autoflushMs=0,
>>> maxInflightRpcs=1000, maxMemory=93218406, enableBulkMutationThrottling=false,
>>> bulkMutationRpcTargetMs=100}, callOptionsConfig=CallOptionsConfig{useTimeout=false,
>>> shortRpcTimeoutMs=60000, longRpcTimeoutMs=600000},
>>> usePlaintextNegotiation=false}.
>>>
>>> Getting following error:
>>>
>>> 2017-10-27 17:37:51,660 ERROR store.HBaseStore - Failed 1 action:
>>> UnsupportedOperationException: 1 time, servers with issues:
>>> bigtable.googleapis.com,
>>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
>>> Failed 1 action: UnsupportedOperationException: 1 time, servers with
>>> issues: bigtable.googleapis.com,
>>> at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.hand
>>> leExceptions(BigtableBufferedMutator.java:271)
>>> at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.muta
>>> te(BigtableBufferedMutator.java:198)
>>> at org.apache.gora.hbase.store.HBaseTableConnection.flushCommit
>>> s(HBaseTableConnection.java:115)
>>> at org.apache.gora.hbase.store.HBaseTableConnection.close(HBase
>>> TableConnection.java:127)
>>> at org.apache.gora.hbase.store.HBaseStore.close(HBaseStore.java:819)
>>> at org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordW
>>> riter.java:56)
>>> at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.cl
>>> ose(MapTask.java:647)
>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>>> at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.
>>> run(LocalJobRunner.java:243)
>>> at java.util.concurrent.Executors$RunnableAdapter.call(Executor
>>> s.java:511)
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>>> Executor.java:1149)
>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>>> lExecutor.java:624)
>>> at java.lang.Thread.run(Thread.java:748)
>>>
>>> Thanks
>>> Akshar
>>>
>>
>>
>

Re: Nutch + Gora + Hbase client ( BigTable )

Posted by SJC Multimedia <sj...@gmail.com>.
Thanks for the suggestion. Very interested in trying it out. Can you please
suggest step need to build gora from source so that I can modify
HBaseTableConnection?

I already have dependency for bigtable and hbase-common 1.2.3 in my ivy
file.

Thanks
Akshar

On Tue, Oct 31, 2017 at 12:27 PM, Alfonso Nishikawa <
alfonso.nishikawa@gmail.com> wrote:

> Hi, Akshar.
>
> Much probably you are the first one in do what you are trying. I never
> used Google Cloud Platform, but in case there is no answer to your
> question, my only suggestion would be to clone the repository [1], try with
> the bigtable dependency:
>
>       <dependency>
>         <groupId>com.google.cloud.bigtable</groupId>
>         <artifactId>bigtable-hbase-1.x-hadoop</artifactId>
>         <version>1.0.0-pre3</version>
>       </dependency>
>
> and add some "catch" at HBaseTableConnection class [2] to see what is
> happening there.
>
> I know this is not a solution, but I am at your disposal for any question
> about this approach (when I know the answer, of course).
>
> [1] https://github.com/apache/gora/tree/apache-gora-0.8
> [2] https://github.com/apache/gora/blob/apache-gora-0.8/
> gora-hbase/src/main/java/org/apache/gora/hbase/store/
> HBaseTableConnection.java#L115
>
> Regars,
>
> Alfonso Nishikawa
>
>
>
> 2017-10-30 17:08 GMT-01:00 SJC Multimedia <sj...@gmail.com>:
>
>> Hi
>>
>> I am trying out Google BigTable as a nutch backend for which there is no
>> official documentation that its supported. However I dont see any reason
>> why it would be not be possible so I am giving it a shot.
>>
>> I have upgraded Gora to 0.8 version with Nutch 2.3.1 and JDK to 1.8.
>>
>> Currently while utilizing *bigtable-hbase-1.x-hadoop-1.0.0-pre3.jar *version,
>> call to Bigtable fails while performing flushCommits as part of inject
>> operation. I do see the table getting created on the BigTable side but the
>> table is empty.
>>
>> The exception by itself is not enough to give us an answer.  The
>> UnsupportedOperationException is a bit strange.  I'm not sure where
>> that's coming from.  Here
>> <https://cloud.google.com/bigtable/docs/hbase-batch-exceptions>'s a
>> guide on getting more information from a RetriesExhaustedWithDetailsException,
>> since neither Gora or BigtableBufferedMutator are under our control.
>>
>> This seems like a client-side thing, so this is likely some strange
>> interaction between BigTable library and Gora.
>>
>> *Any suggestion on how exactly to figure out what is the issue here?*
>>
>>
>> Here is grpc session info:
>>
>> 2017-10-27 17:37:51,462 INFO  grpc.BigtableSession - Bigtable options:
>> BigtableOptions{dataHost=bigtable.googleapis.com, tableAdminHost=
>> bigtableadmin.googleapis.com, instanceAdminHost=bigtableadmi
>> n.googleapis.com, projectId=xxxxxx-dev, instanceId=big-table-nutch-test,
>> userAgent=hbase-1.2.0-cdh5.13.0, credentialType=DefaultCredentials,
>> port=443, dataChannelCount=20, retryOptions=RetryOptions{retriesEnabled=true,
>> allowRetriesWithoutTimestamp=false, statusToRetryOn=[INTERNAL,
>> DEADLINE_EXCEEDED, ABORTED, UNAUTHENTICATED, UNAVAILABLE],
>> initialBackoffMillis=5, maxElapsedBackoffMillis=60000,
>> backoffMultiplier=2.0, streamingBufferSize=60,
>> readPartialRowTimeoutMillis=60000, maxScanTimeoutRetries=3},
>> bulkOptions=BulkOptions{asyncMutatorCount=2, useBulkApi=true,
>> bulkMaxKeyCount=25, bulkMaxRequestSize=1048576, autoflushMs=0,
>> maxInflightRpcs=1000, maxMemory=93218406, enableBulkMutationThrottling=false,
>> bulkMutationRpcTargetMs=100}, callOptionsConfig=CallOptionsConfig{useTimeout=false,
>> shortRpcTimeoutMs=60000, longRpcTimeoutMs=600000},
>> usePlaintextNegotiation=false}.
>>
>> Getting following error:
>>
>> 2017-10-27 17:37:51,660 ERROR store.HBaseStore - Failed 1 action:
>> UnsupportedOperationException: 1 time, servers with issues:
>> bigtable.googleapis.com,
>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
>> Failed 1 action: UnsupportedOperationException: 1 time, servers with
>> issues: bigtable.googleapis.com,
>> at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.hand
>> leExceptions(BigtableBufferedMutator.java:271)
>> at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.muta
>> te(BigtableBufferedMutator.java:198)
>> at org.apache.gora.hbase.store.HBaseTableConnection.flushCommit
>> s(HBaseTableConnection.java:115)
>> at org.apache.gora.hbase.store.HBaseTableConnection.close(HBase
>> TableConnection.java:127)
>> at org.apache.gora.hbase.store.HBaseStore.close(HBaseStore.java:819)
>> at org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordW
>> riter.java:56)
>> at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.cl
>> ose(MapTask.java:647)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>> at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.
>> run(LocalJobRunner.java:243)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executor
>> s.java:511)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1149)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:624)
>> at java.lang.Thread.run(Thread.java:748)
>>
>> Thanks
>> Akshar
>>
>
>

Re: Nutch + Gora + Hbase client ( BigTable )

Posted by lewis john mcgibbney <le...@apache.org>.
ACK, we only really to try support Apache distributions for various
libraries. I think Alfonsos suggestion is best. Please keep in mind
however, Gora depends upon Hadoop 2.X now... you may also run in to some
issues there.
Lewis

On Tue, Oct 31, 2017 at 12:27 PM, Alfonso Nishikawa <
alfonso.nishikawa@gmail.com> wrote:

> Hi, Akshar.
>
> Much probably you are the first one in do what you are trying. I never
> used Google Cloud Platform, but in case there is no answer to your
> question, my only suggestion would be to clone the repository [1], try with
> the bigtable dependency:
>
>       <dependency>
>         <groupId>com.google.cloud.bigtable</groupId>
>         <artifactId>bigtable-hbase-1.x-hadoop</artifactId>
>         <version>1.0.0-pre3</version>
>       </dependency>
>
> and add some "catch" at HBaseTableConnection class [2] to see what is
> happening there.
>
> I know this is not a solution, but I am at your disposal for any question
> about this approach (when I know the answer, of course).
>
> [1] https://github.com/apache/gora/tree/apache-gora-0.8
> [2] https://github.com/apache/gora/blob/apache-gora-0.8/
> gora-hbase/src/main/java/org/apache/gora/hbase/store/
> HBaseTableConnection.java#L115
>
> Regars,
>
> Alfonso Nishikawa
>
>
>
> 2017-10-30 17:08 GMT-01:00 SJC Multimedia <sj...@gmail.com>:
>
>> Hi
>>
>> I am trying out Google BigTable as a nutch backend for which there is no
>> official documentation that its supported. However I dont see any reason
>> why it would be not be possible so I am giving it a shot.
>>
>> I have upgraded Gora to 0.8 version with Nutch 2.3.1 and JDK to 1.8.
>>
>> Currently while utilizing *bigtable-hbase-1.x-hadoop-1.0.0-pre3.jar *version,
>> call to Bigtable fails while performing flushCommits as part of inject
>> operation. I do see the table getting created on the BigTable side but the
>> table is empty.
>>
>> The exception by itself is not enough to give us an answer.  The
>> UnsupportedOperationException is a bit strange.  I'm not sure where
>> that's coming from.  Here
>> <https://cloud.google.com/bigtable/docs/hbase-batch-exceptions>'s a
>> guide on getting more information from a RetriesExhaustedWithDetailsException,
>> since neither Gora or BigtableBufferedMutator are under our control.
>>
>> This seems like a client-side thing, so this is likely some strange
>> interaction between BigTable library and Gora.
>>
>> *Any suggestion on how exactly to figure out what is the issue here?*
>>
>>
>> Here is grpc session info:
>>
>> 2017-10-27 17:37:51,462 INFO  grpc.BigtableSession - Bigtable options:
>> BigtableOptions{dataHost=bigtable.googleapis.com, tableAdminHost=
>> bigtableadmin.googleapis.com, instanceAdminHost=bigtableadmi
>> n.googleapis.com, projectId=xxxxxx-dev, instanceId=big-table-nutch-test,
>> userAgent=hbase-1.2.0-cdh5.13.0, credentialType=DefaultCredentials,
>> port=443, dataChannelCount=20, retryOptions=RetryOptions{retriesEnabled=true,
>> allowRetriesWithoutTimestamp=false, statusToRetryOn=[INTERNAL,
>> DEADLINE_EXCEEDED, ABORTED, UNAUTHENTICATED, UNAVAILABLE],
>> initialBackoffMillis=5, maxElapsedBackoffMillis=60000,
>> backoffMultiplier=2.0, streamingBufferSize=60,
>> readPartialRowTimeoutMillis=60000, maxScanTimeoutRetries=3},
>> bulkOptions=BulkOptions{asyncMutatorCount=2, useBulkApi=true,
>> bulkMaxKeyCount=25, bulkMaxRequestSize=1048576, autoflushMs=0,
>> maxInflightRpcs=1000, maxMemory=93218406, enableBulkMutationThrottling=false,
>> bulkMutationRpcTargetMs=100}, callOptionsConfig=CallOptionsConfig{useTimeout=false,
>> shortRpcTimeoutMs=60000, longRpcTimeoutMs=600000},
>> usePlaintextNegotiation=false}.
>>
>> Getting following error:
>>
>> 2017-10-27 17:37:51,660 ERROR store.HBaseStore - Failed 1 action:
>> UnsupportedOperationException: 1 time, servers with issues:
>> bigtable.googleapis.com,
>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
>> Failed 1 action: UnsupportedOperationException: 1 time, servers with
>> issues: bigtable.googleapis.com,
>> at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.hand
>> leExceptions(BigtableBufferedMutator.java:271)
>> at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.muta
>> te(BigtableBufferedMutator.java:198)
>> at org.apache.gora.hbase.store.HBaseTableConnection.flushCommit
>> s(HBaseTableConnection.java:115)
>> at org.apache.gora.hbase.store.HBaseTableConnection.close(HBase
>> TableConnection.java:127)
>> at org.apache.gora.hbase.store.HBaseStore.close(HBaseStore.java:819)
>> at org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordW
>> riter.java:56)
>> at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.cl
>> ose(MapTask.java:647)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>> at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.
>> run(LocalJobRunner.java:243)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executor
>> s.java:511)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1149)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:624)
>> at java.lang.Thread.run(Thread.java:748)
>>
>> Thanks
>> Akshar
>>
>
>


-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney

Re: Nutch + Gora + Hbase client ( BigTable )

Posted by Alfonso Nishikawa <al...@gmail.com>.
Hi, Akshar.

Much probably you are the first one in do what you are trying. I never used
Google Cloud Platform, but in case there is no answer to your question, my
only suggestion would be to clone the repository [1], try with the bigtable
dependency:

      <dependency>
        <groupId>com.google.cloud.bigtable</groupId>
        <artifactId>bigtable-hbase-1.x-hadoop</artifactId>
        <version>1.0.0-pre3</version>
      </dependency>

and add some "catch" at HBaseTableConnection class [2] to see what is
happening there.

I know this is not a solution, but I am at your disposal for any question
about this approach (when I know the answer, of course).

[1] https://github.com/apache/gora/tree/apache-gora-0.8
[2]
https://github.com/apache/gora/blob/apache-gora-0.8/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseTableConnection.java#L115

Regars,

Alfonso Nishikawa



2017-10-30 17:08 GMT-01:00 SJC Multimedia <sj...@gmail.com>:

> Hi
>
> I am trying out Google BigTable as a nutch backend for which there is no
> official documentation that its supported. However I dont see any reason
> why it would be not be possible so I am giving it a shot.
>
> I have upgraded Gora to 0.8 version with Nutch 2.3.1 and JDK to 1.8.
>
> Currently while utilizing *bigtable-hbase-1.x-hadoop-1.0.0-pre3.jar *version,
> call to Bigtable fails while performing flushCommits as part of inject
> operation. I do see the table getting created on the BigTable side but the
> table is empty.
>
> The exception by itself is not enough to give us an answer.  The
> UnsupportedOperationException is a bit strange.  I'm not sure where
> that's coming from.  Here
> <https://cloud.google.com/bigtable/docs/hbase-batch-exceptions>'s a guide
> on getting more information from a RetriesExhaustedWithDetailsException,
> since neither Gora or BigtableBufferedMutator are under our control.
>
> This seems like a client-side thing, so this is likely some strange
> interaction between BigTable library and Gora.
>
> *Any suggestion on how exactly to figure out what is the issue here?*
>
>
> Here is grpc session info:
>
> 2017-10-27 17:37:51,462 INFO  grpc.BigtableSession - Bigtable options:
> BigtableOptions{dataHost=bigtable.googleapis.com, tableAdminHost=
> bigtableadmin.googleapis.com, instanceAdminHost=bigtableadmi
> n.googleapis.com, projectId=xxxxxx-dev, instanceId=big-table-nutch-test,
> userAgent=hbase-1.2.0-cdh5.13.0, credentialType=DefaultCredentials,
> port=443, dataChannelCount=20, retryOptions=RetryOptions{retriesEnabled=true,
> allowRetriesWithoutTimestamp=false, statusToRetryOn=[INTERNAL,
> DEADLINE_EXCEEDED, ABORTED, UNAUTHENTICATED, UNAVAILABLE],
> initialBackoffMillis=5, maxElapsedBackoffMillis=60000,
> backoffMultiplier=2.0, streamingBufferSize=60,
> readPartialRowTimeoutMillis=60000, maxScanTimeoutRetries=3},
> bulkOptions=BulkOptions{asyncMutatorCount=2, useBulkApi=true,
> bulkMaxKeyCount=25, bulkMaxRequestSize=1048576, autoflushMs=0,
> maxInflightRpcs=1000, maxMemory=93218406, enableBulkMutationThrottling=false,
> bulkMutationRpcTargetMs=100}, callOptionsConfig=CallOptionsConfig{useTimeout=false,
> shortRpcTimeoutMs=60000, longRpcTimeoutMs=600000},
> usePlaintextNegotiation=false}.
>
> Getting following error:
>
> 2017-10-27 17:37:51,660 ERROR store.HBaseStore - Failed 1 action:
> UnsupportedOperationException: 1 time, servers with issues:
> bigtable.googleapis.com,
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> Failed 1 action: UnsupportedOperationException: 1 time, servers with
> issues: bigtable.googleapis.com,
> at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.hand
> leExceptions(BigtableBufferedMutator.java:271)
> at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.muta
> te(BigtableBufferedMutator.java:198)
> at org.apache.gora.hbase.store.HBaseTableConnection.flushCommit
> s(HBaseTableConnection.java:115)
> at org.apache.gora.hbase.store.HBaseTableConnection.close(HBase
> TableConnection.java:127)
> at org.apache.gora.hbase.store.HBaseStore.close(HBaseStore.java:819)
> at org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordW
> riter.java:56)
> at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.cl
> ose(MapTask.java:647)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.
> run(LocalJobRunner.java:243)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> Executor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> lExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>
> Thanks
> Akshar
>