You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@accumulo.apache.org by pdread <pa...@siginttech.com> on 2014/04/08 15:40:08 UTC

bulk ingest without mapred

Hi

I interface to an accumulo cloud (100s of nodes) which I don't maintain.
I'll try and keep this short, the interface App is used to ingest millions
of docs/week from various streams, some are required near real time. A
problem came up where the tservers would not stay up and our ingest would
halt. Now the admins are working on fixing this but I'm not optimistic.
Others who have run into this tell me its the use of Mutations that is
causing the problem and it will go away if I do bulk ingest. However
mapreduce is way to slow to spin up and does not map to our arch.

So here is what I have been trying to do. After much research I think I
should be able to bulk ingest if I create the RFile and feed this to
TableOperations.importDirectory(). I can create the RFile ok, at least I
thinks so, I create the "failure" directory using hadoops' file system. I
check that the failure directory is there and is a directory but when I feed
it to the import I get an error over on the accumulo master log that the it
can not find the failure directory. Now the interesting thing is I have
traced the code thourgh the accumulo client it checks successfully for the
load file and the failure directory. What am I doing wrong?

First the client error:

org.apache.accumulo.core.client.AccumuloException: Internal error processing
waitForTableOperation
	at
org.apache.accumulo.core.client.admin.TableOperationsImpl.doTableOperation(TableOperationsImpl.java:290)
	at
org.apache.accumulo.core.client.admin.TableOperationsImpl.doTableOperation(TableOperationsImpl.java:258)
	at
org.apache.accumulo.core.client.admin.TableOperationsImpl.importDirectory(TableOperationsImpl.java:945)
	at
airs.medr.accumulo.server.table.EntityTable.writeEntities(EntityTable.java:130)

Now the master log exception:

2014-04-08 08:33:50,609 [thrift.MasterClientService$Processor] ERROR:
Internal error processing waitForTableOperation
java.lang.RuntimeException: java.io.FileNotFoundException: File does not
exist: bulk/entities_fails/failures
        at
org.apache.accumulo.server.master.Master$MasterClientServiceHandler.waitForTableOperation(Master.java:1053)
        at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.accumulo.cloudtrace.instrument.thrift.TraceWrap$1.invoke(TraceWrap.java:59)
        at $Proxy6.waitForTableOperation(Unknown Source)
        at
org.apache.accumulo.core.master.thrift.MasterClientService$Processor$waitForTableOperation.process(MasterClientService.java:2004)
        at
org.apache.accumulo.core.master.thrift.MasterClientService$Processor.process(MasterClientService.java:1472)
        at
org.apache.accumulo.server.util.TServerUtils$TimedProcessor.process(TServerUtils.java:154)
        at
org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:631)
        at
org.apache.accumulo.server.util.TServerUtils$THsHaServer$Invocation.run(TServerUtils.java:202)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at
org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.FileNotFoundException: File does not exist:
bulk/entities_fails/failures
        at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:528)
        at
org.apache.accumulo.server.trace.TraceFileSystem.getFileStatus(TraceFileSystem.java:797)
        at
org.apache.accumulo.server.master.tableOps.BulkImport.call(BulkImport.java:157)
        at
org.apache.accumulo.server.master.tableOps.BulkImport.call(BulkImport.java:110)
        at
org.apache.accumulo.server.master.tableOps.TraceRepo.call(TraceRepo.java:65)
        at
org.apache.accumulo.server.fate.Fate$TransactionRunner.run(Fate.java:65)

 
Thoughts?

Thanks

Paul
  



--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/bulk-ingest-without-mapred-tp8904.html
Sent from the Users mailing list archive at Nabble.com.

Re: bulk ingest without mapred

Posted by William Slacum <wi...@accumulo.net>.

You should be creating those directories via a FileSystem object. Browse
your file system using the Namenode's webapp (if it's local, it'll usually
be http://localhost:50070) to see if those directories exist.

On Tue, Apr 8, 2014 at 12:48 PM, pdread <pa...@siginttech.com> wrote:

> Josh
>
> So what you're telling me there is no programmatic way to add files to HDFS
> except via the command line?
>
> If thats the case then its a pretty sad system. The world doesn't run on
> the
> command line.
>
> Thanks
>
> Paul
>
>
>
> --
> View this message in context:
> http://apache-accumulo.1065345.n5.nabble.com/bulk-ingest-without-mapred-tp8904p8916.html
> Sent from the Users mailing list archive at Nabble.com.
>