You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Jesse McConnell <je...@gmail.com> on 2011/10/14 22:27:55 UTC

Fwd: TableOperations import directory

had been advised to send to dev list...but I'll cc users as well
----

What is the format of the files in the directory that you are supposed
to be importing from?

I am currently getting a lovely 'unknown result' error message out of
the ClientService.

We are currently writing rfiles into the directory we are importing
from....any pointers?

cheers,
jesse

--
jesse mcconnell
jesse.mcconnell@gmail.com

Re: TableOperations import directory

Posted by Jesse McConnell <je...@gmail.com>.
right, and I saw no data that way either...

so forcing the compaction should update the stats in the web ui....my
thoughts on the scan was perhaps I didn't have permissions set to the
data correctly and was banking on the web ui for decent stats to check
success...

so I have already tried the compaction + webui check :/

cheers,
jesse

--
jesse mcconnell
jesse.mcconnell@gmail.com



On Mon, Oct 17, 2011 at 12:23, Billie J Rinaldi
<bi...@ugov.gov> wrote:
> On Monday, October 17, 2011 1:16:49 PM, "Jesse McConnell" <je...@gmail.com> wrote:
>> Well, it seems like it is successful only there is no data, going to
>> the web ui only shows no new data in the table. So it _appears_ that
>> it is failing silently on something.
>
> The monitor page entry counts are only estimates, and at this point they do not include newly bulk imported data.  (This is so the tserver doesn't have to open every file to bring it online.)  You should be able to see the data when you do a scan, and when the files get compacted you'll see the updated counts on the monitor page.  You can trigger compaction by running "compact -t tablename" in the shell.
>
> Billie
>

Re: TableOperations import directory

Posted by Jesse McConnell <je...@gmail.com>.
> Everything in your process sounds correct. There is no data besides the file used for the bulk import process. So generating on one hdfs and transferring it over should pose no problems.

ok, that is great to know.

> Can you elaborate on the differences in regard to the examples artifact? The examples should have no effect on any regular aspects of the system. So if you could elaborate on how it doesn't work, that would be a good start. Like does it error, silently fail, etc.? One good place to look is the monitor page, as that will spit up any errors/warnings you get from the tservers, so that way if there's an error rising up from the rfiles your using, you should be able to see that so we can correct it.

We generate the rfile to a local hdfs instance and then copy it up to
the hdfs that the import needs to operate from.  Its under some random
location based on the classname of the job copying it at the moment.
We then prepare the call to the import directory and that executes
successfully.  The 00000_00000.rf file is getting picked up and copied
over into the table the import process wants it in, some
tables/#/bulk_uuid named deal.  From that point on it just seems to be
silent.  No warnings in the monitor, nothing.

We bring up cbshell and run a scan on the table in question and its
empty...run a compaction (which is not apparently needed) and no data.
 It just seems to disappear into the æther.

Debugging the client side of the BulkImportHelper I seems to me that
the [0-9]* getGlobs call on the hadoop side seems to return nothing so
figured it might be file name related for a while there but then we
saw that the example had them in the same file naming conventions...so
at a loss at the moment.

> Also, glancing at your other emails, are you putting them in the Accumulo directory in hdfs when you move them to the true hdfs instance? I highly suggest you don't do that if you are. Let bulk import put them in the right place in the accumulo directory. It will just keep things simpler.

no, we are letting the import directory call manage all of that.

Re: TableOperations import directory

Posted by Jesse McConnell <je...@gmail.com>.
I'll see what I can do, might have to send it directly to you but
looking into it.

cheers,
jesse

--
jesse mcconnell
jesse.mcconnell@gmail.com



On Tue, Oct 18, 2011 at 12:37, John W Vines <jo...@ugov.gov> wrote:
>
>
> ----- Original Message -----
> | From: "Jesse McConnell" <je...@gmail.com>
> | To: accumulo-user@incubator.apache.org
> | Sent: Tuesday, October 18, 2011 1:20:30 PM
> | Subject: Re: TableOperations import directory
> | So this is a sample line from your bulk ingest file:
> |
> | row_00000000 foo:0 [] 1318956047982 false -> value_00000000
> |
> | and this is a sample from mine:
> |
> | !:6 ColumnFamilyName:C%b5;*/.:%dc;%fe;+%1e;%1a;%8c;%00;%00;%01;: []
> | 1318957045232 false ->
> |
> | and another from the same file:
> |
> | H%00;~]5iSHT%a0;Z%19;%00;%00;%00;%00; DifferentColumnFamily:10:7#do []
> | 1318957045532 false -> 244
> |
> | Anything jump out as bogus with this?
> |
> | cheers,
> | jesse
> |
> |
> | --
> | jesse mcconnell
> | jesse.mcconnell@gmail.com
>
> Could you generate a small rfile and send it to us so we can verify? We're a bit stumped by this error and want to see if it will break on our end too.
>

Re: TableOperations import directory

Posted by Jesse McConnell <je...@gmail.com>.
I figured out the root of the problem..

If I pass in a qualified failure path to the dfs I was getting an
error about the wrong FS, that it needed to be file:///

I was able to resolve this by using the CachedConfiguration.set() to
the configuration I was using which allowed the BulkImportHelper to
get the correct cached configuration object so it could in turn create
the FAIL dir on the right file _and_ then call getGlobs on the [0-9]*
on the correct fs...

so it was able to copy the file in the importDirectory over just fine,
but when it went to make the failure directory that was being passed
in it would pull a bad configuration and then get itself hopelessly
lost.

I am seeing imported data now :)

thanks much guys,
jesse

--
jesse mcconnell
jesse.mcconnell@gmail.com



On Tue, Oct 18, 2011 at 15:14, Keith Turner <ke...@deenlo.com> wrote:
>> Not seeing this either, again, while debugging here I don't ever see
>> any files being returned in the BulkImportHelper, it always returns an
>> empty list of mapFilesInfo which lead me down the rathole to getGlobs
>> pattern matching [0-9]* files.  Which is what lead us to start looking
>> at your example to see what was fundamentally different. :/
>>
>
> Seems like once we ran into an issue w/ bulk import where a source
> file was passed in instead of a source dir.  Seems like the hdfs
> listStatus call did something odd in this case, like returned an empty
> set.  So maybe nothing happened.  My memory is fuzzy on this though.
> Make sure you are passing the correct dir to the import call.
>

Re: TableOperations import directory

Posted by Keith Turner <ke...@deenlo.com>.
> Not seeing this either, again, while debugging here I don't ever see
> any files being returned in the BulkImportHelper, it always returns an
> empty list of mapFilesInfo which lead me down the rathole to getGlobs
> pattern matching [0-9]* files.  Which is what lead us to start looking
> at your example to see what was fundamentally different. :/
>

Seems like once we ran into an issue w/ bulk import where a source
file was passed in instead of a source dir.  Seems like the hdfs
listStatus call did something odd in this case, like returned an empty
set.  So maybe nothing happened.  My memory is fuzzy on this though.
Make sure you are passing the correct dir to the import call.

Re: TableOperations import directory

Posted by Jesse McConnell <je...@gmail.com>.
I have an idea...it could be that fail that not being on the correct
FS is then making subsequent calls in that BulkImportHelper method not
execute against the correct file system...that would explain why its
not finding the rf files..

validating now

cheers,
jesse

--
jesse mcconnell
jesse.mcconnell@gmail.com



On Tue, Oct 18, 2011 at 15:06, Jesse McConnell
<je...@gmail.com> wrote:
> Hold the horses...now when I importDirectory from the shell I am
> seeing information in the scan...
>
> so that means the rfile itself is fine...
>
> --
> jesse mcconnell
> jesse.mcconnell@gmail.com
>
>
>
> On Tue, Oct 18, 2011 at 15:00, Jesse McConnell
> <je...@gmail.com> wrote:
>>>  1) Move files from src dir into an /accumulo/tables/<tid>/bulk_XXX
>>> dir.
>>
>>> So after the bulk import files should be missing from source dir, is this the case?
>>
>> Yep
>>
>>> Do you see bulk dirs under the /accumulo/tables/<tid> dir, if so do they contain anything?
>>
>> Yep
>>
>>> The move is done by a random tablet server.  In some tablet server debug log,
>>> should see messages like "Moved <src> to <dest>" coming from the class
>>> ClientServiceHandler.
>>
>> I have seen this in the debugging as well
>>
>>>  2) Inspect the files to determine first and last key.  I did not see
>>> any client side related debugging for this.
>>
>> I see the first and last key using the rfile.PrintInfo and it looks
>> reasonable, though it does seem to mention only one of the column
>> families in play for the file.
>>
>>>  3) Contact tablet servers to assign files to tablet.  There should
>>> be client and server side debugging related to this. If you set the
>>> log4j level to DEBUG on the client side for org.apache.accumulo.core,
>>> I am thinking you should see the following message.
>>>
>>>           log.debug("Assigning " + uniqMapFiles.size() + " map files to "
>>>                    + assignmentsPerTablet.size() + " tablets at " + location);
>>
>> Not seeing this either, again, while debugging here I don't ever see
>> any files being returned in the BulkImportHelper, it always returns an
>> empty list of mapFilesInfo which lead me down the rathole to getGlobs
>> pattern matching [0-9]* files.  Which is what lead us to start looking
>> at your example to see what was fundamentally different. :/
>>
>>> On the server side, you should see the following on a tablet server
>>>
>>>          log.log(TLevel.TABLET_HIST, extent+" import "+path+"
>>> "+paths.get(tpath));
>>>
>>> Could grep for TABLET_HIST and then grep for import to find this.
>>
>> I don't see any 'import' text in the tserver_host.debug.log like this...
>>
>>>  4) Anything that fails, is copied to the failure dir you supplied.
>>> Is the failure dir empty after the import?
>>
>> Empty, but I should double check, I get errors if this failure
>> directory is on the same hdfs as the input to the importDirectory, it
>> errors and says its expecting file:///  But when I just pass in a new
>> Path("FAIL-" + uuid) relative path it runs.
>>
>> thanks for the help
>>
>> cheers,
>> jesse
>>
>

Re: TableOperations import directory

Posted by Jesse McConnell <je...@gmail.com>.
Hold the horses...now when I importDirectory from the shell I am
seeing information in the scan...

so that means the rfile itself is fine...

--
jesse mcconnell
jesse.mcconnell@gmail.com



On Tue, Oct 18, 2011 at 15:00, Jesse McConnell
<je...@gmail.com> wrote:
>>  1) Move files from src dir into an /accumulo/tables/<tid>/bulk_XXX
>> dir.
>
>> So after the bulk import files should be missing from source dir, is this the case?
>
> Yep
>
>> Do you see bulk dirs under the /accumulo/tables/<tid> dir, if so do they contain anything?
>
> Yep
>
>> The move is done by a random tablet server.  In some tablet server debug log,
>> should see messages like "Moved <src> to <dest>" coming from the class
>> ClientServiceHandler.
>
> I have seen this in the debugging as well
>
>>  2) Inspect the files to determine first and last key.  I did not see
>> any client side related debugging for this.
>
> I see the first and last key using the rfile.PrintInfo and it looks
> reasonable, though it does seem to mention only one of the column
> families in play for the file.
>
>>  3) Contact tablet servers to assign files to tablet.  There should
>> be client and server side debugging related to this. If you set the
>> log4j level to DEBUG on the client side for org.apache.accumulo.core,
>> I am thinking you should see the following message.
>>
>>           log.debug("Assigning " + uniqMapFiles.size() + " map files to "
>>                    + assignmentsPerTablet.size() + " tablets at " + location);
>
> Not seeing this either, again, while debugging here I don't ever see
> any files being returned in the BulkImportHelper, it always returns an
> empty list of mapFilesInfo which lead me down the rathole to getGlobs
> pattern matching [0-9]* files.  Which is what lead us to start looking
> at your example to see what was fundamentally different. :/
>
>> On the server side, you should see the following on a tablet server
>>
>>          log.log(TLevel.TABLET_HIST, extent+" import "+path+"
>> "+paths.get(tpath));
>>
>> Could grep for TABLET_HIST and then grep for import to find this.
>
> I don't see any 'import' text in the tserver_host.debug.log like this...
>
>>  4) Anything that fails, is copied to the failure dir you supplied.
>> Is the failure dir empty after the import?
>
> Empty, but I should double check, I get errors if this failure
> directory is on the same hdfs as the input to the importDirectory, it
> errors and says its expecting file:///  But when I just pass in a new
> Path("FAIL-" + uuid) relative path it runs.
>
> thanks for the help
>
> cheers,
> jesse
>

Re: TableOperations import directory

Posted by Jesse McConnell <je...@gmail.com>.
>  1) Move files from src dir into an /accumulo/tables/<tid>/bulk_XXX
> dir.

> So after the bulk import files should be missing from source dir, is this the case?

Yep

> Do you see bulk dirs under the /accumulo/tables/<tid> dir, if so do they contain anything?

Yep

> The move is done by a random tablet server.  In some tablet server debug log,
> should see messages like "Moved <src> to <dest>" coming from the class
> ClientServiceHandler.

I have seen this in the debugging as well

>  2) Inspect the files to determine first and last key.  I did not see
> any client side related debugging for this.

I see the first and last key using the rfile.PrintInfo and it looks
reasonable, though it does seem to mention only one of the column
families in play for the file.

>  3) Contact tablet servers to assign files to tablet.  There should
> be client and server side debugging related to this. If you set the
> log4j level to DEBUG on the client side for org.apache.accumulo.core,
> I am thinking you should see the following message.
>
>           log.debug("Assigning " + uniqMapFiles.size() + " map files to "
>                    + assignmentsPerTablet.size() + " tablets at " + location);

Not seeing this either, again, while debugging here I don't ever see
any files being returned in the BulkImportHelper, it always returns an
empty list of mapFilesInfo which lead me down the rathole to getGlobs
pattern matching [0-9]* files.  Which is what lead us to start looking
at your example to see what was fundamentally different. :/

> On the server side, you should see the following on a tablet server
>
>          log.log(TLevel.TABLET_HIST, extent+" import "+path+"
> "+paths.get(tpath));
>
> Could grep for TABLET_HIST and then grep for import to find this.

I don't see any 'import' text in the tserver_host.debug.log like this...

>  4) Anything that fails, is copied to the failure dir you supplied.
> Is the failure dir empty after the import?

Empty, but I should double check, I get errors if this failure
directory is on the same hdfs as the input to the importDirectory, it
errors and says its expecting file:///  But when I just pass in a new
Path("FAIL-" + uuid) relative path it runs.

thanks for the help

cheers,
jesse

Re: TableOperations import directory

Posted by Keith Turner <ke...@deenlo.com>.
Jesse,

Bulk import does the following

  1) Move files from src dir into an /accumulo/tables/<tid>/bulk_XXX
dir.  So after the bulk import files should be missing from source
dir, is this the case?  Do you see bulk dirs under the
/accumulo/tables/<tid> dir, if so do they contain anything?  The move
is done by a random tablet server.  In some tablet server debug log,
should see messages like "Moved <src> to <dest>" coming from the class
ClientServiceHandler.
  2) Inspect the files to determine first and last key.  I did not see
any client side related debugging for this.
  3) Contact tablet servers to assign files to tablet.  There should
be client and server side debugging related to this. If you set the
log4j level to DEBUG on the client side for org.apache.accumulo.core,
I am thinking you should see the following message.

           log.debug("Assigning " + uniqMapFiles.size() + " map files to "
                    + assignmentsPerTablet.size() + " tablets at " + location);

On the server side, you should see the following on a tablet server

          log.log(TLevel.TABLET_HIST, extent+" import "+path+"
"+paths.get(tpath));

Could grep for TABLET_HIST and then grep for import to find this.

  4) Anything that fails, is copied to the failure dir you supplied.
Is the failure dir empty after the import?

Keith

Re: No Such SessionID reported on tablet server

Posted by Keith Massey <ke...@digitalreasoning.com>.
On 10/20/11 4:03 PM, Keith Turner wrote:
> On Thu, Oct 20, 2011 at 4:38 PM, Keith Massey
> <ke...@digitalreasoning.com>  wrote:
>> They are not, but we are not seeing any exceptions, so they must all be
>> getting called. Our mapreduce job would fail if any of them failed to close
>> (maybe not the best idea, but at least it would be obvious if we had
>> failure).
>>
> Ok,
>
> What process do you use to determine what data is missing?  Are there
> any patterns to the missing data? Do you have any iterators on your
> tables?  Are you writing multiple versions (by defaults only one
> version is kept and the others are dropped).  Are you setting the
> visibility column?
>
> Are you reusing or modify Mutation objects after you give them to the
> batch writer?  The BatchWriter buffers them and writes them in the
> background.  It creates a shallow copy when you give it a mutation.

Good question. I had been building the key incorrectly in some cases. 
Once I fixed my code the problems went away. Sorry for the false alarm, 
and thanks for the help.

Re: No Such SessionID reported on tablet server

Posted by Keith Turner <ke...@deenlo.com>.
On Thu, Oct 20, 2011 at 4:38 PM, Keith Massey
<ke...@digitalreasoning.com> wrote:
>
> They are not, but we are not seeing any exceptions, so they must all be
> getting called. Our mapreduce job would fail if any of them failed to close
> (maybe not the best idea, but at least it would be obvious if we had
> failure).
>

Ok,

What process do you use to determine what data is missing?  Are there
any patterns to the missing data? Do you have any iterators on your
tables?  Are you writing multiple versions (by defaults only one
version is kept and the others are dropped).  Are you setting the
visibility column?

Are you reusing or modify Mutation objects after you give them to the
batch writer?  The BatchWriter buffers them and writes them in the
background.  It creates a shallow copy when you give it a mutation.

Re: No Such SessionID reported on tablet server

Posted by Keith Massey <ke...@digitalreasoning.com>.
On 10/20/11 3:35 PM, Keith Turner wrote:
> On Thu, Oct 20, 2011 at 4:30 PM, Keith Massey
> <ke...@digitalreasoning.com>  wrote:
>> We're calling close() on each of the BatchWriters in the cleanup() method of
>> our mapper (we're only using mappers). I can try to use the
>> MultiTableBatchWriter but it sounds like that will be for performance gains
>> only, right?
>>
> Are batch writers closed in such a way that if one close call throws
> an exception, subsequent batch writers are still closed?

They are not, but we are not seeing any exceptions, so they must all be 
getting called. Our mapreduce job would fail if any of them failed to 
close (maybe not the best idea, but at least it would be obvious if we 
had failure).

Re: No Such SessionID reported on tablet server

Posted by Keith Turner <ke...@deenlo.com>.
On Thu, Oct 20, 2011 at 4:30 PM, Keith Massey
<ke...@digitalreasoning.com> wrote:
>
> We're calling close() on each of the BatchWriters in the cleanup() method of
> our mapper (we're only using mappers). I can try to use the
> MultiTableBatchWriter but it sounds like that will be for performance gains
> only, right?
>

Are batch writers closed in such a way that if one close call throws
an exception, subsequent batch writers are still closed?

Re: No Such SessionID reported on tablet server

Posted by Keith Massey <ke...@digitalreasoning.com>.
On 10/20/11 3:23 PM, John W Vines wrote:
> ----- Original Message -----
> So I'm under the impression you are writing to multiple tables in your reducer (or mapper with no reduce process). If you do not close your batchwriters, they will not flush the buffered data that has not yet been sent. Make sure you call close() on them before you wrap up the process to make sure data goes through.
>
> On a similar note, if you're ingesting to multiple tables, I highly recommend you use a MultiTableBatchWriter, and then get the BatchWriters you need from that. It's more efficient in the way it sends data into the tservers and that way you only have to worry about a single close() call. If you're only ingesting into a single table, you may want to consider one of our OutputFormats, as they handle the closing and whatnot involved so you don't have to.
>
> As for your error, I wouldn't be concerned about it. If you take some time between sending data, whether your client process was swapping or it just does a lot of computation between writing data, then the server side will think the client session is over and close it. When the BatchWriter attempts to reconnect it will handle the error and create a new session so there should be no problems stemming from that.
>
> Let us know if you have anymore problems
> John

We're calling close() on each of the BatchWriters in the cleanup() 
method of our mapper (we're only using mappers). I can try to use the 
MultiTableBatchWriter but it sounds like that will be for performance 
gains only, right?

Re: No Such SessionID reported on tablet server

Posted by John W Vines <jo...@ugov.gov>.

----- Original Message -----
| From: "John W Vines" <jo...@ugov.gov>
| To: accumulo-user@incubator.apache.org
| Sent: Thursday, October 20, 2011 4:23:48 PM
| Subject: Re: No Such SessionID reported on tablet server
| ----- Original Message -----
| | From: "Keith Massey" <ke...@digitalreasoning.com>
| | To: accumulo-user@incubator.apache.org
| | Sent: Thursday, October 20, 2011 4:06:08 PM
| | Subject: No Such SessionID reported on tablet server
| | We are loading data into cloudbase 1.3.2 using
| | cloudbase.core.data.Mutation.BatchWriter.addMutation() from a
| | map/reduce
| | job. We use one BatchWriter per table. The data appears to go in
| | fine
| | --
| | no exceptions are reported in the map/reduce job. And most of the
| | data
| | does appear to be there. But some of it (maybe 1% if I had to guess)
| | is
| | not there in our cloudbase tables. The only errors we have seen
| | anywhere
| | are in the tserver logs. They look like this:
| |
| | 20 16:26:21,529 [server.TThreadPoolServer] ERROR: Error occurred
| | during
| | processing of message.
| | java.lang.RuntimeException: No Such SessionID
| | at
| | cloudbase.server.tabletserver.TabletServer$ThriftClientHandler.applyUpdate(TabletServer.java:1327)
| | at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
| | at
| | sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
| | at java.lang.reflect.Method.invoke(Method.java:597)
| | at
| | cloudtrace.instrument.thrift.TraceWrap$1.invoke(TraceWrap.java:43)
| | at $Proxy1.applyUpdate(Unknown Source)
| | at
| | cloudbase.core.tabletserver.thrift.TabletClientService$Processor$applyUpdate.process(TabletClientService.java:1184)
| | at
| | cloudbase.core.tabletserver.thrift.TabletClientService$Processor.process(TabletClientService.java:885)
| | at
| | cloudbase.server.util.TServerUtils$TimedProcessor.process(TServerUtils.java:121)
| | at
| | org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
| | at
| | java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
| | at
| | java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
| | at java.lang.Thread.run(Thread.java:619)
| |
| | These don't seem to make it back to the client side though. And I
| | don't
| | believe I control any kind of session id. Any ideas what I can do?
| | Thanks.
| |
| | Keith
| 
| 
| So I'm under the impression you are writing to multiple tables in your
| reducer (or mapper with no reduce process). If you do not close your
| batchwriters, they will not flush the buffered data that has not yet
| been sent. Make sure you call close() on them before you wrap up the
| process to make sure data goes through.
| 
| On a similar note, if you're ingesting to multiple tables, I highly
| recommend you use a MultiTableBatchWriter, and then get the
| BatchWriters you need from that. It's more efficient in the way it
| sends data into the tservers and that way you only have to worry about
| a single close() call. If you're only ingesting into a single table,
| you may want to consider one of our OutputFormats, as they handle the
| closing and whatnot involved so you don't have to.
| 
| As for your error, I wouldn't be concerned about it. If you take some
| time between sending data, whether your client process was swapping or
| it just does a lot of computation between writing data, then the
| server side will think the client session is over and close it. When
| the BatchWriter attempts to reconnect it will handle the error and
| create a new session so there should be no problems stemming from
| that.
| 
| Let us know if you have anymore problems
| John


Correction- our OutputFormat allows writing to multiple tables, so I suggest you just use that.

And another addendum regarding your error - upon further consideration, the explanation is probably more along the lines of your tabletserver swapping, or something else which would cause general slowness to your tserver or a network congestion issue. Heavy computation times will only have an affect if they are interfering with the thread which is sending the data off to the tserver.


Sorry about the mistake
John

Re: No Such SessionID reported on tablet server

Posted by John W Vines <jo...@ugov.gov>.

----- Original Message -----
| From: "Keith Massey" <ke...@digitalreasoning.com>
| To: accumulo-user@incubator.apache.org
| Sent: Thursday, October 20, 2011 4:06:08 PM
| Subject: No Such SessionID reported on tablet server
| We are loading data into cloudbase 1.3.2 using
| cloudbase.core.data.Mutation.BatchWriter.addMutation() from a
| map/reduce
| job. We use one BatchWriter per table. The data appears to go in fine
| --
| no exceptions are reported in the map/reduce job. And most of the data
| does appear to be there. But some of it (maybe 1% if I had to guess)
| is
| not there in our cloudbase tables. The only errors we have seen
| anywhere
| are in the tserver logs. They look like this:
| 
| 20 16:26:21,529 [server.TThreadPoolServer] ERROR: Error occurred
| during
| processing of message.
| java.lang.RuntimeException: No Such SessionID
| at
| cloudbase.server.tabletserver.TabletServer$ThriftClientHandler.applyUpdate(TabletServer.java:1327)
| at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
| at
| sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
| at java.lang.reflect.Method.invoke(Method.java:597)
| at
| cloudtrace.instrument.thrift.TraceWrap$1.invoke(TraceWrap.java:43)
| at $Proxy1.applyUpdate(Unknown Source)
| at
| cloudbase.core.tabletserver.thrift.TabletClientService$Processor$applyUpdate.process(TabletClientService.java:1184)
| at
| cloudbase.core.tabletserver.thrift.TabletClientService$Processor.process(TabletClientService.java:885)
| at
| cloudbase.server.util.TServerUtils$TimedProcessor.process(TServerUtils.java:121)
| at
| org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
| at
| java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
| at
| java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
| at java.lang.Thread.run(Thread.java:619)
| 
| These don't seem to make it back to the client side though. And I
| don't
| believe I control any kind of session id. Any ideas what I can do?
| Thanks.
| 
| Keith


So I'm under the impression you are writing to multiple tables in your reducer (or mapper with no reduce process). If you do not close your batchwriters, they will not flush the buffered data that has not yet been sent. Make sure you call close() on them before you wrap up the process to make sure data goes through.

On a similar note, if you're ingesting to multiple tables, I highly recommend you use a MultiTableBatchWriter, and then get the BatchWriters you need from that. It's more efficient in the way it sends data into the tservers and that way you only have to worry about a single close() call. If you're only ingesting into a single table, you may want to consider one of our OutputFormats, as they handle the closing and whatnot involved so you don't have to.

As for your error, I wouldn't be concerned about it. If you take some time between sending data, whether your client process was swapping or it just does a lot of computation between writing data, then the server side will think the client session is over and close it. When the BatchWriter attempts to reconnect it will handle the error and create a new session so there should be no problems stemming from that.

Let us know if you have anymore problems
John

No Such SessionID reported on tablet server

Posted by Keith Massey <ke...@digitalreasoning.com>.
We are loading data into cloudbase 1.3.2 using 
cloudbase.core.data.Mutation.BatchWriter.addMutation() from a map/reduce 
job. We use one BatchWriter per table. The data appears to go in fine -- 
no exceptions are reported in the map/reduce job. And most of the data 
does appear to be there. But some of it (maybe 1% if I had to guess) is 
not there in our cloudbase tables. The only errors we have seen anywhere 
are in the tserver logs. They look like this:

20 16:26:21,529 [server.TThreadPoolServer] ERROR: Error occurred during 
processing of message.
java.lang.RuntimeException: No Such SessionID
         at 
cloudbase.server.tabletserver.TabletServer$ThriftClientHandler.applyUpdate(TabletServer.java:1327)
         at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
         at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
         at java.lang.reflect.Method.invoke(Method.java:597)
         at 
cloudtrace.instrument.thrift.TraceWrap$1.invoke(TraceWrap.java:43)
         at $Proxy1.applyUpdate(Unknown Source)
         at 
cloudbase.core.tabletserver.thrift.TabletClientService$Processor$applyUpdate.process(TabletClientService.java:1184)
         at 
cloudbase.core.tabletserver.thrift.TabletClientService$Processor.process(TabletClientService.java:885)
         at 
cloudbase.server.util.TServerUtils$TimedProcessor.process(TServerUtils.java:121)
         at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
         at java.lang.Thread.run(Thread.java:619)

These don't seem to make it back to the client side though. And I don't 
believe I control any kind of session id. Any ideas what I can do? Thanks.

Keith

Re: TableOperations import directory

Posted by Jesse McConnell <je...@gmail.com>.
Doesn't look like it, I just run the test again and ran scan on the
!METADATA table I see this:

root@cb !METADATA> scan
!0;!0< srv:dir []    /root_tablet
!0;!0< ~tab:~pr []    \x00
!0;~ file:/table_info/00005_00000.rf []    675,13
!0;~ last:133181bdef20003 []    127.0.0.1:9997
!0;~ loc:133181bdef20003 []    127.0.0.1:9997
!0;~ log:127.0.0.1:11224/360d2fa9-8984-4202-9fe6-5fc7404a10ae []
127.0.0.1:11224/360d2fa9-8984-4202-9fe6-5fc7404a10ae|3
!0;~ srv:dir []    /table_info
!0;~ srv:lock []    tservers/127.0.0.1:9997/zlock-0000000000$86439325833756675
!0;~ srv:time []    L40
!0;~ ~tab:~pr []    \x01!0<
!0< last:133181bdef20003 []    127.0.0.1:9997
!0< loc:133181bdef20003 []    127.0.0.1:9997
!0< log:127.0.0.1:11224/360d2fa9-8984-4202-9fe6-5fc7404a10ae []
127.0.0.1:11224/360d2fa9-8984-4202-9fe6-5fc7404a10ae|2
!0< srv:dir []    /default_tablet
!0< srv:lock []    tservers/127.0.0.1:9997/zlock-0000000000$86439325833756675
!0< srv:time []    L32
!0< ~tab:~pr []    \x01~
0< file:/default_tablet/00001_00000.rf []    17886,674
0< last:133181bdef20003 []    127.0.0.1:9997
0< loc:133181bdef20003 []    127.0.0.1:9997
0< log:127.0.0.1:11224/360d2fa9-8984-4202-9fe6-5fc7404a10ae []
127.0.0.1:11224/360d2fa9-8984-4202-9fe6-5fc7404a10ae|4
0< srv:dir []    /default_tablet
0< srv:lock []    tservers/127.0.0.1:9997/zlock-0000000000$86439325833756675
0< srv:time []    M1318962906953
0< ~tab:~pr []    \x00
7< loc:133181bdef20003 []    127.0.0.1:9997
7< srv:dir []    /default_tablet
7< srv:lock []    masters/lock/zlock-0000000000$86439325833756678
7< srv:time []    M0
7< ~tab:~pr []    \x00



--
jesse mcconnell
jesse.mcconnell@gmail.com



On Tue, Oct 18, 2011 at 13:56, Billie J Rinaldi
<bi...@ugov.gov> wrote:
> When you scan the !METADATA table, does the file you bulk imported appear in the list of files for your table?  (You'd have to check this before compacting.)
>
> Billie
>

Re: TableOperations import directory

Posted by Billie J Rinaldi <bi...@ugov.gov>.
When you scan the !METADATA table, does the file you bulk imported appear in the list of files for your table?  (You'd have to check this before compacting.)

Billie

Re: TableOperations import directory

Posted by Jesse McConnell <je...@gmail.com>.
---------+---------------------------------------------+------------------------
SCOPE    | NAME                                        | VALUE
---------+---------------------------------------------+------------------------
default  | table.balancer ............................ |
accumulo.server.master.balancer.DefaultLoadBalancer
default  | table.bloom.enabled ....................... | false
default  | table.bloom.error.rate .................... | 0.5%
default  | table.bloom.hash.type ..................... | murmur
default  | table.bloom.key.functor ................... |
accumulo.core.file.keyfunctor.RowFunctor
default  | table.bloom.load.threshold ................ | 1
default  | table.bloom.size .......................... | 1048576
default  | table.cache.block.enable .................. | false
default  | table.cache.index.enable .................. | false
default  | table.compaction.major.everything.at ...... | 19700101000000GMT
default  | table.compaction.major.everything.idle .... | 1h
default  | table.compaction.major.ratio .............. | 3
default  | table.compaction.minor.idle ............... | 5m
default  | table.compaction.minor.logs.threshold ..... | 3
default  | table.failures.ignore ..................... | false
default  | table.file.blocksize ...................... | 0B
default  | table.file.compress.blocksize ............. | 100K
default  | table.file.compress.type .................. | gz
default  | table.file.replication .................... | 0
default  | table.file.type ........................... | rf
default  | table.groups.enabled ...................... |
table    | table.iterator.majc.vers .................. |
20,accumulo.core.iterators.VersioningIterator
table    | table.iterator.majc.vers.opt.maxVersions .. | 1
table    | table.iterator.minc.vers .................. |
20,accumulo.core.iterators.VersioningIterator
table    | table.iterator.minc.vers.opt.maxVersions .. | 1
table    | table.iterator.scan.vers .................. |
20,accumulo.core.iterators.VersioningIterator
table    | table.iterator.scan.vers.opt.maxVersions .. | 1
default  | table.scan.cache.enable ................... | false
default  | table.scan.cache.size ..................... | 8M
default  | table.scan.max.memory ..................... | 50M
default  | table.security.scan.visibility.default .... |
default  | table.split.threshold ..................... | 1G
default  | table.walog.enabled ....................... | true
---------+---------------------------------------------+-----------------------

cheers,
jesse

--
jesse mcconnell
jesse.mcconnell@gmail.com



On Tue, Oct 18, 2011 at 13:19, Keith Turner <ke...@deenlo.com> wrote:
> Jesse,
>
> Can you send the output of running "config -t <your table>" in the shell?
>
> Keith
>

Re: TableOperations import directory

Posted by Keith Turner <ke...@deenlo.com>.
Jesse,

Can you send the output of running "config -t <your table>" in the shell?

Keith

Re: TableOperations import directory

Posted by Jesse McConnell <je...@gmail.com>.
in the mapping we are using the Key constructor that takes just
byte[]'s for the row, family, qualifier, value, etc..

a bit of the data looks binary which I am trying to validate now...but
in general we ought to be getting _something_ showing up in the scan,
even given some bogus formatting I would think

cheers,
jesse

--
jesse mcconnell
jesse.mcconnell@gmail.com



On Tue, Oct 18, 2011 at 12:43, Billie J Rinaldi
<bi...@ugov.gov> wrote:
> On Tuesday, October 18, 2011 1:20:30 PM, "Jesse McConnell" <je...@gmail.com> wrote:
>> Anything jump out as bogus with this?
>
> No, not necessarily; it just looks like you have some binary data in your key.  Were you not expecting that?  Are you reusing Texts or anything?
>
> Accumulo doesn't look at individual lines to put them where they should go.  It just looks at the key range of the file and finds out which tablets cover that range.  If the keys weren't in sorted order that would be a problem; but you can't write an RFile with keys out of order.
>
> Billie
>

Re: TableOperations import directory

Posted by Billie J Rinaldi <bi...@ugov.gov>.
On Tuesday, October 18, 2011 1:20:30 PM, "Jesse McConnell" <je...@gmail.com> wrote:
> Anything jump out as bogus with this?

No, not necessarily; it just looks like you have some binary data in your key.  Were you not expecting that?  Are you reusing Texts or anything?

Accumulo doesn't look at individual lines to put them where they should go.  It just looks at the key range of the file and finds out which tablets cover that range.  If the keys weren't in sorted order that would be a problem; but you can't write an RFile with keys out of order.

Billie

Re: TableOperations import directory

Posted by John W Vines <jo...@ugov.gov>.

----- Original Message -----
| From: "Jesse McConnell" <je...@gmail.com>
| To: accumulo-user@incubator.apache.org
| Sent: Tuesday, October 18, 2011 1:20:30 PM
| Subject: Re: TableOperations import directory
| So this is a sample line from your bulk ingest file:
| 
| row_00000000 foo:0 [] 1318956047982 false -> value_00000000
| 
| and this is a sample from mine:
| 
| !:6 ColumnFamilyName:C%b5;*/.:%dc;%fe;+%1e;%1a;%8c;%00;%00;%01;: []
| 1318957045232 false ->
| 
| and another from the same file:
| 
| H%00;~]5iSHT%a0;Z%19;%00;%00;%00;%00; DifferentColumnFamily:10:7#do []
| 1318957045532 false -> 244
| 
| Anything jump out as bogus with this?
| 
| cheers,
| jesse
| 
| 
| --
| jesse mcconnell
| jesse.mcconnell@gmail.com

Could you generate a small rfile and send it to us so we can verify? We're a bit stumped by this error and want to see if it will break on our end too.

Re: TableOperations import directory

Posted by Jesse McConnell <je...@gmail.com>.
So this is a sample line from your bulk ingest file:

row_00000000 foo:0 [] 1318956047982 false -> value_00000000

and this is a sample from mine:

!:6 ColumnFamilyName:C%b5;*/.:%dc;%fe;+%1e;%1a;%8c;%00;%00;%01;: []
1318957045232 false ->

and another from the same file:

H%00;~]5iSHT%a0;Z%19;%00;%00;%00;%00; DifferentColumnFamily:10:7#do []
1318957045532 false -> 244

Anything jump out as bogus with this?

cheers,
jesse


--
jesse mcconnell
jesse.mcconnell@gmail.com



On Tue, Oct 18, 2011 at 12:04, Jesse McConnell
<je...@gmail.com> wrote:
> By rejected I mean the individual lines of goop in the rfile.
>
> I have a half a meg rf file that looks like it has reasonable goop
> inside of it, but import apparently fails silently for some reason.
> So my thought is things are getting rejected when whatever is
> processing that file is looking at the lines trying to put them where
> they ultimately go..
>
> I see no entries in the web UI and I see no results in the
> scan...presumably its going _somewhere_
>
> cheers,
> jesse
>
> --
> jesse mcconnell
> jesse.mcconnell@gmail.com
>
>
>
> On Tue, Oct 18, 2011 at 11:51, Eric Newton <er...@gmail.com> wrote:
>> I'm not sure I know what you mean by rejected.  You can compare the
>> visibility marking in the Key against the authorizations of the user running
>> the scan.  If I have a visibility marking of "(A|B)&C", an you don't have
>> A,C or B,C or A,B,C as your authorizations, accumulo will not return
>> results.
>> The bulk ingest example sets the authorizations for the root user.
>>
>> -Eric
>>
>> On Tue, Oct 18, 2011 at 12:45 PM, Jesse McConnell
>> <je...@gmail.com> wrote:
>>>
>>> Largely yes, I am sorting through the 'good' rf file from the bulk
>>> ingest example and am trying to sort out what might be the issue with
>>> ours..
>>>
>>> Is there any special debug flags we can throw on that might give us
>>> information on what is being rejected?
>>>
>>> cheers,
>>> jesse
>>>
>>> --
>>> jesse mcconnell
>>> jesse.mcconnell@gmail.com
>>>
>>>
>>>
>>> On Tue, Oct 18, 2011 at 08:36, Eric Newton <er...@gmail.com> wrote:
>>> > If you use:
>>> > ./bin/accumulo org.apache.accumulo.core.file.rfile.PrintInfo -d
>>> > /accumulo/tables/id/bulk_uuid/000000_000000.rf
>>> > Do you see your data?  Does it have visibility markings that would
>>> > filter
>>> > the data out?  Are the timestamps reasonable?
>>> >
>>> > On Mon, Oct 17, 2011 at 7:35 PM, Jesse McConnell
>>> > <je...@gmail.com>
>>> > wrote:
>>> >>
>>> >> On Mon, Oct 17, 2011 at 18:34, Jesse McConnell
>>> >> <je...@gmail.com> wrote:
>>> >> > On Mon, Oct 17, 2011 at 18:29, Eric Newton <er...@gmail.com>
>>> >> > wrote:
>>> >> >> It's possible that the bulkImport client code may be using the
>>> >> >> hadoop
>>> >> >> config
>>> >> >> from the java classpath, which is how it used to work.  I'll
>>> >> >> investigate it
>>> >> >> tomorrow.
>>> >> >
>>> >> > I chased that down in the debugging, unless your thinking that the
>>> >> > hadoop being used in the call to getGlobs on that file pattern my be
>>> >> > hitting a different fs...that might explain it.
>>> >>
>>> >> no no...because the copy from the import directory is actually being
>>> >> copied so I don't think its a different FS issue.
>>> >>
>>> >>
>>> >> > cheers,
>>> >> > jesse
>>> >> >
>>> >> >> See ACCUMULO-43.
>>> >> >> -Eric
>>> >> >>
>>> >> >> On Mon, Oct 17, 2011 at 7:17 PM, John W Vines
>>> >> >> <jo...@ugov.gov>
>>> >> >> wrote:
>>> >> >>>
>>> >> >>> ----- Original Message -----
>>> >> >>> | From: "Jesse McConnell" <je...@gmail.com>
>>> >> >>> | To: accumulo-user@incubator.apache.org
>>> >> >>> | Sent: Monday, October 17, 2011 6:15:43 PM
>>> >> >>> | Subject: Re: TableOperations import directory
>>> >> >>> | We are trying to run a unit test type scenario were we have a m-r
>>> >> >>> | process that generates input to the bulk import process in a
>>> >> >>> local
>>> >> >>> | hadoop fs, and then copy the resulting output to a directory on
>>> >> >>> the
>>> >> >>> | dfs that can then be used as input to the
>>> >> >>> | TableOperations.importDirectory() call.
>>> >> >>> |
>>> >> >>> | Is this a problem? Because the examples seem to work when we run
>>> >> >>> them
>>> >> >>> | with the -examples artifact in the lib dir but when that file is
>>> >> >>> | removed and we try and run it in the same sort of way as the unit
>>> >> >>> test
>>> >> >>> | above it doesn't work.
>>> >> >>> |
>>> >> >>> | Is there some sort of requirement that the data being generated
>>> >> >>> for
>>> >> >>> | the import be going to the import directory of the bulk load
>>> >> >>> process
>>> >> >>> | _have_ to be on the dfs?
>>> >> >>> |
>>> >> >>> | In other words, is a bad assumption that I could take data from
>>> >> >>> hadoop
>>> >> >>> | dfs X and copy it over to hadoop dfs Y and then import it with
>>> >> >>> the
>>> >> >>> | importDirectory command?
>>> >> >>> |
>>> >> >>> | Does the job metadata or the job configuration play any role in
>>> >> >>> the
>>> >> >>> | bulk import process?
>>> >> >>> |
>>> >> >>> | cheers,
>>> >> >>> | jesse
>>> >> >>> |
>>> >> >>> | --
>>> >> >>> | jesse mcconnell
>>> >> >>> | jesse.mcconnell@gmail.com
>>> >> >>> |
>>> >> >>>
>>> >> >>> Everything in your process sounds correct. There is no data besides
>>> >> >>> the
>>> >> >>> file used for the bulk import process. So generating on one hdfs
>>> >> >>> and
>>> >> >>> transferring it over should pose no problems.
>>> >> >>>
>>> >> >>> Can you elaborate on the differences in regard to the examples
>>> >> >>> artifact?
>>> >> >>> The examples should have no effect on any regular aspects of the
>>> >> >>> system. So
>>> >> >>> if you could elaborate on how it doesn't work, that would be a good
>>> >> >>> start.
>>> >> >>> Like does it error, silently fail, etc.? One good place to look is
>>> >> >>> the
>>> >> >>> monitor page, as that will spit up any errors/warnings you get from
>>> >> >>> the
>>> >> >>> tservers, so that way if there's an error rising up from the rfiles
>>> >> >>> your
>>> >> >>> using, you should be able to see that so we can correct it.
>>> >> >>>
>>> >> >>> Also, glancing at your other emails, are you putting them in the
>>> >> >>> Accumulo
>>> >> >>> directory in hdfs when you move them to the true hdfs instance? I
>>> >> >>> highly
>>> >> >>> suggest you don't do that if you are. Let bulk import put them in
>>> >> >>> the
>>> >> >>> right
>>> >> >>> place in the accumulo directory. It will just keep things simpler.
>>> >> >>
>>> >> >>
>>> >> >
>>> >
>>> >
>>
>>
>

Re: TableOperations import directory

Posted by Jesse McConnell <je...@gmail.com>.
By rejected I mean the individual lines of goop in the rfile.

I have a half a meg rf file that looks like it has reasonable goop
inside of it, but import apparently fails silently for some reason.
So my thought is things are getting rejected when whatever is
processing that file is looking at the lines trying to put them where
they ultimately go..

I see no entries in the web UI and I see no results in the
scan...presumably its going _somewhere_

cheers,
jesse

--
jesse mcconnell
jesse.mcconnell@gmail.com



On Tue, Oct 18, 2011 at 11:51, Eric Newton <er...@gmail.com> wrote:
> I'm not sure I know what you mean by rejected.  You can compare the
> visibility marking in the Key against the authorizations of the user running
> the scan.  If I have a visibility marking of "(A|B)&C", an you don't have
> A,C or B,C or A,B,C as your authorizations, accumulo will not return
> results.
> The bulk ingest example sets the authorizations for the root user.
>
> -Eric
>
> On Tue, Oct 18, 2011 at 12:45 PM, Jesse McConnell
> <je...@gmail.com> wrote:
>>
>> Largely yes, I am sorting through the 'good' rf file from the bulk
>> ingest example and am trying to sort out what might be the issue with
>> ours..
>>
>> Is there any special debug flags we can throw on that might give us
>> information on what is being rejected?
>>
>> cheers,
>> jesse
>>
>> --
>> jesse mcconnell
>> jesse.mcconnell@gmail.com
>>
>>
>>
>> On Tue, Oct 18, 2011 at 08:36, Eric Newton <er...@gmail.com> wrote:
>> > If you use:
>> > ./bin/accumulo org.apache.accumulo.core.file.rfile.PrintInfo -d
>> > /accumulo/tables/id/bulk_uuid/000000_000000.rf
>> > Do you see your data?  Does it have visibility markings that would
>> > filter
>> > the data out?  Are the timestamps reasonable?
>> >
>> > On Mon, Oct 17, 2011 at 7:35 PM, Jesse McConnell
>> > <je...@gmail.com>
>> > wrote:
>> >>
>> >> On Mon, Oct 17, 2011 at 18:34, Jesse McConnell
>> >> <je...@gmail.com> wrote:
>> >> > On Mon, Oct 17, 2011 at 18:29, Eric Newton <er...@gmail.com>
>> >> > wrote:
>> >> >> It's possible that the bulkImport client code may be using the
>> >> >> hadoop
>> >> >> config
>> >> >> from the java classpath, which is how it used to work.  I'll
>> >> >> investigate it
>> >> >> tomorrow.
>> >> >
>> >> > I chased that down in the debugging, unless your thinking that the
>> >> > hadoop being used in the call to getGlobs on that file pattern my be
>> >> > hitting a different fs...that might explain it.
>> >>
>> >> no no...because the copy from the import directory is actually being
>> >> copied so I don't think its a different FS issue.
>> >>
>> >>
>> >> > cheers,
>> >> > jesse
>> >> >
>> >> >> See ACCUMULO-43.
>> >> >> -Eric
>> >> >>
>> >> >> On Mon, Oct 17, 2011 at 7:17 PM, John W Vines
>> >> >> <jo...@ugov.gov>
>> >> >> wrote:
>> >> >>>
>> >> >>> ----- Original Message -----
>> >> >>> | From: "Jesse McConnell" <je...@gmail.com>
>> >> >>> | To: accumulo-user@incubator.apache.org
>> >> >>> | Sent: Monday, October 17, 2011 6:15:43 PM
>> >> >>> | Subject: Re: TableOperations import directory
>> >> >>> | We are trying to run a unit test type scenario were we have a m-r
>> >> >>> | process that generates input to the bulk import process in a
>> >> >>> local
>> >> >>> | hadoop fs, and then copy the resulting output to a directory on
>> >> >>> the
>> >> >>> | dfs that can then be used as input to the
>> >> >>> | TableOperations.importDirectory() call.
>> >> >>> |
>> >> >>> | Is this a problem? Because the examples seem to work when we run
>> >> >>> them
>> >> >>> | with the -examples artifact in the lib dir but when that file is
>> >> >>> | removed and we try and run it in the same sort of way as the unit
>> >> >>> test
>> >> >>> | above it doesn't work.
>> >> >>> |
>> >> >>> | Is there some sort of requirement that the data being generated
>> >> >>> for
>> >> >>> | the import be going to the import directory of the bulk load
>> >> >>> process
>> >> >>> | _have_ to be on the dfs?
>> >> >>> |
>> >> >>> | In other words, is a bad assumption that I could take data from
>> >> >>> hadoop
>> >> >>> | dfs X and copy it over to hadoop dfs Y and then import it with
>> >> >>> the
>> >> >>> | importDirectory command?
>> >> >>> |
>> >> >>> | Does the job metadata or the job configuration play any role in
>> >> >>> the
>> >> >>> | bulk import process?
>> >> >>> |
>> >> >>> | cheers,
>> >> >>> | jesse
>> >> >>> |
>> >> >>> | --
>> >> >>> | jesse mcconnell
>> >> >>> | jesse.mcconnell@gmail.com
>> >> >>> |
>> >> >>>
>> >> >>> Everything in your process sounds correct. There is no data besides
>> >> >>> the
>> >> >>> file used for the bulk import process. So generating on one hdfs
>> >> >>> and
>> >> >>> transferring it over should pose no problems.
>> >> >>>
>> >> >>> Can you elaborate on the differences in regard to the examples
>> >> >>> artifact?
>> >> >>> The examples should have no effect on any regular aspects of the
>> >> >>> system. So
>> >> >>> if you could elaborate on how it doesn't work, that would be a good
>> >> >>> start.
>> >> >>> Like does it error, silently fail, etc.? One good place to look is
>> >> >>> the
>> >> >>> monitor page, as that will spit up any errors/warnings you get from
>> >> >>> the
>> >> >>> tservers, so that way if there's an error rising up from the rfiles
>> >> >>> your
>> >> >>> using, you should be able to see that so we can correct it.
>> >> >>>
>> >> >>> Also, glancing at your other emails, are you putting them in the
>> >> >>> Accumulo
>> >> >>> directory in hdfs when you move them to the true hdfs instance? I
>> >> >>> highly
>> >> >>> suggest you don't do that if you are. Let bulk import put them in
>> >> >>> the
>> >> >>> right
>> >> >>> place in the accumulo directory. It will just keep things simpler.
>> >> >>
>> >> >>
>> >> >
>> >
>> >
>
>

Re: TableOperations import directory

Posted by Eric Newton <er...@gmail.com>.
I'm not sure I know what you mean by rejected.  You can compare the
visibility marking in the Key against the authorizations of the user running
the scan.  If I have a visibility marking of "(A|B)&C", an you don't have
A,C or B,C or A,B,C as your authorizations, accumulo will not return
results.

The bulk ingest example sets the authorizations for the root user.

-Eric

On Tue, Oct 18, 2011 at 12:45 PM, Jesse McConnell <jesse.mcconnell@gmail.com
> wrote:

> Largely yes, I am sorting through the 'good' rf file from the bulk
> ingest example and am trying to sort out what might be the issue with
> ours..
>
> Is there any special debug flags we can throw on that might give us
> information on what is being rejected?
>
> cheers,
> jesse
>
> --
> jesse mcconnell
> jesse.mcconnell@gmail.com
>
>
>
> On Tue, Oct 18, 2011 at 08:36, Eric Newton <er...@gmail.com> wrote:
> > If you use:
> > ./bin/accumulo org.apache.accumulo.core.file.rfile.PrintInfo -d
> > /accumulo/tables/id/bulk_uuid/000000_000000.rf
> > Do you see your data?  Does it have visibility markings that would filter
> > the data out?  Are the timestamps reasonable?
> >
> > On Mon, Oct 17, 2011 at 7:35 PM, Jesse McConnell <
> jesse.mcconnell@gmail.com>
> > wrote:
> >>
> >> On Mon, Oct 17, 2011 at 18:34, Jesse McConnell
> >> <je...@gmail.com> wrote:
> >> > On Mon, Oct 17, 2011 at 18:29, Eric Newton <er...@gmail.com>
> >> > wrote:
> >> >> It's possible that the bulkImport client code may be using the hadoop
> >> >> config
> >> >> from the java classpath, which is how it used to work.  I'll
> >> >> investigate it
> >> >> tomorrow.
> >> >
> >> > I chased that down in the debugging, unless your thinking that the
> >> > hadoop being used in the call to getGlobs on that file pattern my be
> >> > hitting a different fs...that might explain it.
> >>
> >> no no...because the copy from the import directory is actually being
> >> copied so I don't think its a different FS issue.
> >>
> >>
> >> > cheers,
> >> > jesse
> >> >
> >> >> See ACCUMULO-43.
> >> >> -Eric
> >> >>
> >> >> On Mon, Oct 17, 2011 at 7:17 PM, John W Vines <john.w.vines@ugov.gov
> >
> >> >> wrote:
> >> >>>
> >> >>> ----- Original Message -----
> >> >>> | From: "Jesse McConnell" <je...@gmail.com>
> >> >>> | To: accumulo-user@incubator.apache.org
> >> >>> | Sent: Monday, October 17, 2011 6:15:43 PM
> >> >>> | Subject: Re: TableOperations import directory
> >> >>> | We are trying to run a unit test type scenario were we have a m-r
> >> >>> | process that generates input to the bulk import process in a local
> >> >>> | hadoop fs, and then copy the resulting output to a directory on
> the
> >> >>> | dfs that can then be used as input to the
> >> >>> | TableOperations.importDirectory() call.
> >> >>> |
> >> >>> | Is this a problem? Because the examples seem to work when we run
> >> >>> them
> >> >>> | with the -examples artifact in the lib dir but when that file is
> >> >>> | removed and we try and run it in the same sort of way as the unit
> >> >>> test
> >> >>> | above it doesn't work.
> >> >>> |
> >> >>> | Is there some sort of requirement that the data being generated
> for
> >> >>> | the import be going to the import directory of the bulk load
> process
> >> >>> | _have_ to be on the dfs?
> >> >>> |
> >> >>> | In other words, is a bad assumption that I could take data from
> >> >>> hadoop
> >> >>> | dfs X and copy it over to hadoop dfs Y and then import it with the
> >> >>> | importDirectory command?
> >> >>> |
> >> >>> | Does the job metadata or the job configuration play any role in
> the
> >> >>> | bulk import process?
> >> >>> |
> >> >>> | cheers,
> >> >>> | jesse
> >> >>> |
> >> >>> | --
> >> >>> | jesse mcconnell
> >> >>> | jesse.mcconnell@gmail.com
> >> >>> |
> >> >>>
> >> >>> Everything in your process sounds correct. There is no data besides
> >> >>> the
> >> >>> file used for the bulk import process. So generating on one hdfs and
> >> >>> transferring it over should pose no problems.
> >> >>>
> >> >>> Can you elaborate on the differences in regard to the examples
> >> >>> artifact?
> >> >>> The examples should have no effect on any regular aspects of the
> >> >>> system. So
> >> >>> if you could elaborate on how it doesn't work, that would be a good
> >> >>> start.
> >> >>> Like does it error, silently fail, etc.? One good place to look is
> the
> >> >>> monitor page, as that will spit up any errors/warnings you get from
> >> >>> the
> >> >>> tservers, so that way if there's an error rising up from the rfiles
> >> >>> your
> >> >>> using, you should be able to see that so we can correct it.
> >> >>>
> >> >>> Also, glancing at your other emails, are you putting them in the
> >> >>> Accumulo
> >> >>> directory in hdfs when you move them to the true hdfs instance? I
> >> >>> highly
> >> >>> suggest you don't do that if you are. Let bulk import put them in
> the
> >> >>> right
> >> >>> place in the accumulo directory. It will just keep things simpler.
> >> >>
> >> >>
> >> >
> >
> >
>

Re: TableOperations import directory

Posted by Jesse McConnell <je...@gmail.com>.
Largely yes, I am sorting through the 'good' rf file from the bulk
ingest example and am trying to sort out what might be the issue with
ours..

Is there any special debug flags we can throw on that might give us
information on what is being rejected?

cheers,
jesse

--
jesse mcconnell
jesse.mcconnell@gmail.com



On Tue, Oct 18, 2011 at 08:36, Eric Newton <er...@gmail.com> wrote:
> If you use:
> ./bin/accumulo org.apache.accumulo.core.file.rfile.PrintInfo -d
> /accumulo/tables/id/bulk_uuid/000000_000000.rf
> Do you see your data?  Does it have visibility markings that would filter
> the data out?  Are the timestamps reasonable?
>
> On Mon, Oct 17, 2011 at 7:35 PM, Jesse McConnell <je...@gmail.com>
> wrote:
>>
>> On Mon, Oct 17, 2011 at 18:34, Jesse McConnell
>> <je...@gmail.com> wrote:
>> > On Mon, Oct 17, 2011 at 18:29, Eric Newton <er...@gmail.com>
>> > wrote:
>> >> It's possible that the bulkImport client code may be using the hadoop
>> >> config
>> >> from the java classpath, which is how it used to work.  I'll
>> >> investigate it
>> >> tomorrow.
>> >
>> > I chased that down in the debugging, unless your thinking that the
>> > hadoop being used in the call to getGlobs on that file pattern my be
>> > hitting a different fs...that might explain it.
>>
>> no no...because the copy from the import directory is actually being
>> copied so I don't think its a different FS issue.
>>
>>
>> > cheers,
>> > jesse
>> >
>> >> See ACCUMULO-43.
>> >> -Eric
>> >>
>> >> On Mon, Oct 17, 2011 at 7:17 PM, John W Vines <jo...@ugov.gov>
>> >> wrote:
>> >>>
>> >>> ----- Original Message -----
>> >>> | From: "Jesse McConnell" <je...@gmail.com>
>> >>> | To: accumulo-user@incubator.apache.org
>> >>> | Sent: Monday, October 17, 2011 6:15:43 PM
>> >>> | Subject: Re: TableOperations import directory
>> >>> | We are trying to run a unit test type scenario were we have a m-r
>> >>> | process that generates input to the bulk import process in a local
>> >>> | hadoop fs, and then copy the resulting output to a directory on the
>> >>> | dfs that can then be used as input to the
>> >>> | TableOperations.importDirectory() call.
>> >>> |
>> >>> | Is this a problem? Because the examples seem to work when we run
>> >>> them
>> >>> | with the -examples artifact in the lib dir but when that file is
>> >>> | removed and we try and run it in the same sort of way as the unit
>> >>> test
>> >>> | above it doesn't work.
>> >>> |
>> >>> | Is there some sort of requirement that the data being generated for
>> >>> | the import be going to the import directory of the bulk load process
>> >>> | _have_ to be on the dfs?
>> >>> |
>> >>> | In other words, is a bad assumption that I could take data from
>> >>> hadoop
>> >>> | dfs X and copy it over to hadoop dfs Y and then import it with the
>> >>> | importDirectory command?
>> >>> |
>> >>> | Does the job metadata or the job configuration play any role in the
>> >>> | bulk import process?
>> >>> |
>> >>> | cheers,
>> >>> | jesse
>> >>> |
>> >>> | --
>> >>> | jesse mcconnell
>> >>> | jesse.mcconnell@gmail.com
>> >>> |
>> >>>
>> >>> Everything in your process sounds correct. There is no data besides
>> >>> the
>> >>> file used for the bulk import process. So generating on one hdfs and
>> >>> transferring it over should pose no problems.
>> >>>
>> >>> Can you elaborate on the differences in regard to the examples
>> >>> artifact?
>> >>> The examples should have no effect on any regular aspects of the
>> >>> system. So
>> >>> if you could elaborate on how it doesn't work, that would be a good
>> >>> start.
>> >>> Like does it error, silently fail, etc.? One good place to look is the
>> >>> monitor page, as that will spit up any errors/warnings you get from
>> >>> the
>> >>> tservers, so that way if there's an error rising up from the rfiles
>> >>> your
>> >>> using, you should be able to see that so we can correct it.
>> >>>
>> >>> Also, glancing at your other emails, are you putting them in the
>> >>> Accumulo
>> >>> directory in hdfs when you move them to the true hdfs instance? I
>> >>> highly
>> >>> suggest you don't do that if you are. Let bulk import put them in the
>> >>> right
>> >>> place in the accumulo directory. It will just keep things simpler.
>> >>
>> >>
>> >
>
>

Re: TableOperations import directory

Posted by Eric Newton <er...@gmail.com>.
If you use:

./bin/accumulo org.apache.accumulo.core.file.rfile.PrintInfo -d
/accumulo/tables/id/bulk_uuid/000000_000000.rf

Do you see your data?  Does it have visibility markings that would filter
the data out?  Are the timestamps reasonable?

On Mon, Oct 17, 2011 at 7:35 PM, Jesse McConnell
<je...@gmail.com>wrote:

> On Mon, Oct 17, 2011 at 18:34, Jesse McConnell
> <je...@gmail.com> wrote:
> > On Mon, Oct 17, 2011 at 18:29, Eric Newton <er...@gmail.com>
> wrote:
> >> It's possible that the bulkImport client code may be using the hadoop
> config
> >> from the java classpath, which is how it used to work.  I'll investigate
> it
> >> tomorrow.
> >
> > I chased that down in the debugging, unless your thinking that the
> > hadoop being used in the call to getGlobs on that file pattern my be
> > hitting a different fs...that might explain it.
>
> no no...because the copy from the import directory is actually being
> copied so I don't think its a different FS issue.
>
>
> > cheers,
> > jesse
> >
> >> See ACCUMULO-43.
> >> -Eric
> >>
> >> On Mon, Oct 17, 2011 at 7:17 PM, John W Vines <jo...@ugov.gov>
> wrote:
> >>>
> >>> ----- Original Message -----
> >>> | From: "Jesse McConnell" <je...@gmail.com>
> >>> | To: accumulo-user@incubator.apache.org
> >>> | Sent: Monday, October 17, 2011 6:15:43 PM
> >>> | Subject: Re: TableOperations import directory
> >>> | We are trying to run a unit test type scenario were we have a m-r
> >>> | process that generates input to the bulk import process in a local
> >>> | hadoop fs, and then copy the resulting output to a directory on the
> >>> | dfs that can then be used as input to the
> >>> | TableOperations.importDirectory() call.
> >>> |
> >>> | Is this a problem? Because the examples seem to work when we run them
> >>> | with the -examples artifact in the lib dir but when that file is
> >>> | removed and we try and run it in the same sort of way as the unit
> test
> >>> | above it doesn't work.
> >>> |
> >>> | Is there some sort of requirement that the data being generated for
> >>> | the import be going to the import directory of the bulk load process
> >>> | _have_ to be on the dfs?
> >>> |
> >>> | In other words, is a bad assumption that I could take data from
> hadoop
> >>> | dfs X and copy it over to hadoop dfs Y and then import it with the
> >>> | importDirectory command?
> >>> |
> >>> | Does the job metadata or the job configuration play any role in the
> >>> | bulk import process?
> >>> |
> >>> | cheers,
> >>> | jesse
> >>> |
> >>> | --
> >>> | jesse mcconnell
> >>> | jesse.mcconnell@gmail.com
> >>> |
> >>>
> >>> Everything in your process sounds correct. There is no data besides the
> >>> file used for the bulk import process. So generating on one hdfs and
> >>> transferring it over should pose no problems.
> >>>
> >>> Can you elaborate on the differences in regard to the examples
> artifact?
> >>> The examples should have no effect on any regular aspects of the
> system. So
> >>> if you could elaborate on how it doesn't work, that would be a good
> start.
> >>> Like does it error, silently fail, etc.? One good place to look is the
> >>> monitor page, as that will spit up any errors/warnings you get from the
> >>> tservers, so that way if there's an error rising up from the rfiles
> your
> >>> using, you should be able to see that so we can correct it.
> >>>
> >>> Also, glancing at your other emails, are you putting them in the
> Accumulo
> >>> directory in hdfs when you move them to the true hdfs instance? I
> highly
> >>> suggest you don't do that if you are. Let bulk import put them in the
> right
> >>> place in the accumulo directory. It will just keep things simpler.
> >>
> >>
> >
>

Re: TableOperations import directory

Posted by Jesse McConnell <je...@gmail.com>.
On Mon, Oct 17, 2011 at 18:34, Jesse McConnell
<je...@gmail.com> wrote:
> On Mon, Oct 17, 2011 at 18:29, Eric Newton <er...@gmail.com> wrote:
>> It's possible that the bulkImport client code may be using the hadoop config
>> from the java classpath, which is how it used to work.  I'll investigate it
>> tomorrow.
>
> I chased that down in the debugging, unless your thinking that the
> hadoop being used in the call to getGlobs on that file pattern my be
> hitting a different fs...that might explain it.

no no...because the copy from the import directory is actually being
copied so I don't think its a different FS issue.


> cheers,
> jesse
>
>> See ACCUMULO-43.
>> -Eric
>>
>> On Mon, Oct 17, 2011 at 7:17 PM, John W Vines <jo...@ugov.gov> wrote:
>>>
>>> ----- Original Message -----
>>> | From: "Jesse McConnell" <je...@gmail.com>
>>> | To: accumulo-user@incubator.apache.org
>>> | Sent: Monday, October 17, 2011 6:15:43 PM
>>> | Subject: Re: TableOperations import directory
>>> | We are trying to run a unit test type scenario were we have a m-r
>>> | process that generates input to the bulk import process in a local
>>> | hadoop fs, and then copy the resulting output to a directory on the
>>> | dfs that can then be used as input to the
>>> | TableOperations.importDirectory() call.
>>> |
>>> | Is this a problem? Because the examples seem to work when we run them
>>> | with the -examples artifact in the lib dir but when that file is
>>> | removed and we try and run it in the same sort of way as the unit test
>>> | above it doesn't work.
>>> |
>>> | Is there some sort of requirement that the data being generated for
>>> | the import be going to the import directory of the bulk load process
>>> | _have_ to be on the dfs?
>>> |
>>> | In other words, is a bad assumption that I could take data from hadoop
>>> | dfs X and copy it over to hadoop dfs Y and then import it with the
>>> | importDirectory command?
>>> |
>>> | Does the job metadata or the job configuration play any role in the
>>> | bulk import process?
>>> |
>>> | cheers,
>>> | jesse
>>> |
>>> | --
>>> | jesse mcconnell
>>> | jesse.mcconnell@gmail.com
>>> |
>>>
>>> Everything in your process sounds correct. There is no data besides the
>>> file used for the bulk import process. So generating on one hdfs and
>>> transferring it over should pose no problems.
>>>
>>> Can you elaborate on the differences in regard to the examples artifact?
>>> The examples should have no effect on any regular aspects of the system. So
>>> if you could elaborate on how it doesn't work, that would be a good start.
>>> Like does it error, silently fail, etc.? One good place to look is the
>>> monitor page, as that will spit up any errors/warnings you get from the
>>> tservers, so that way if there's an error rising up from the rfiles your
>>> using, you should be able to see that so we can correct it.
>>>
>>> Also, glancing at your other emails, are you putting them in the Accumulo
>>> directory in hdfs when you move them to the true hdfs instance? I highly
>>> suggest you don't do that if you are. Let bulk import put them in the right
>>> place in the accumulo directory. It will just keep things simpler.
>>
>>
>

Re: TableOperations import directory

Posted by Jesse McConnell <je...@gmail.com>.
On Mon, Oct 17, 2011 at 18:29, Eric Newton <er...@gmail.com> wrote:
> It's possible that the bulkImport client code may be using the hadoop config
> from the java classpath, which is how it used to work.  I'll investigate it
> tomorrow.

I chased that down in the debugging, unless your thinking that the
hadoop being used in the call to getGlobs on that file pattern my be
hitting a different fs...that might explain it.

cheers,
jesse

> See ACCUMULO-43.
> -Eric
>
> On Mon, Oct 17, 2011 at 7:17 PM, John W Vines <jo...@ugov.gov> wrote:
>>
>> ----- Original Message -----
>> | From: "Jesse McConnell" <je...@gmail.com>
>> | To: accumulo-user@incubator.apache.org
>> | Sent: Monday, October 17, 2011 6:15:43 PM
>> | Subject: Re: TableOperations import directory
>> | We are trying to run a unit test type scenario were we have a m-r
>> | process that generates input to the bulk import process in a local
>> | hadoop fs, and then copy the resulting output to a directory on the
>> | dfs that can then be used as input to the
>> | TableOperations.importDirectory() call.
>> |
>> | Is this a problem? Because the examples seem to work when we run them
>> | with the -examples artifact in the lib dir but when that file is
>> | removed and we try and run it in the same sort of way as the unit test
>> | above it doesn't work.
>> |
>> | Is there some sort of requirement that the data being generated for
>> | the import be going to the import directory of the bulk load process
>> | _have_ to be on the dfs?
>> |
>> | In other words, is a bad assumption that I could take data from hadoop
>> | dfs X and copy it over to hadoop dfs Y and then import it with the
>> | importDirectory command?
>> |
>> | Does the job metadata or the job configuration play any role in the
>> | bulk import process?
>> |
>> | cheers,
>> | jesse
>> |
>> | --
>> | jesse mcconnell
>> | jesse.mcconnell@gmail.com
>> |
>>
>> Everything in your process sounds correct. There is no data besides the
>> file used for the bulk import process. So generating on one hdfs and
>> transferring it over should pose no problems.
>>
>> Can you elaborate on the differences in regard to the examples artifact?
>> The examples should have no effect on any regular aspects of the system. So
>> if you could elaborate on how it doesn't work, that would be a good start.
>> Like does it error, silently fail, etc.? One good place to look is the
>> monitor page, as that will spit up any errors/warnings you get from the
>> tservers, so that way if there's an error rising up from the rfiles your
>> using, you should be able to see that so we can correct it.
>>
>> Also, glancing at your other emails, are you putting them in the Accumulo
>> directory in hdfs when you move them to the true hdfs instance? I highly
>> suggest you don't do that if you are. Let bulk import put them in the right
>> place in the accumulo directory. It will just keep things simpler.
>
>

Re: TableOperations import directory

Posted by Eric Newton <er...@gmail.com>.
It's possible that the bulkImport client code may be using the hadoop config
from the java classpath, which is how it used to work.  I'll investigate it
tomorrow.

See ACCUMULO-43.

-Eric

On Mon, Oct 17, 2011 at 7:17 PM, John W Vines <jo...@ugov.gov> wrote:

>
> ----- Original Message -----
> | From: "Jesse McConnell" <je...@gmail.com>
> | To: accumulo-user@incubator.apache.org
> | Sent: Monday, October 17, 2011 6:15:43 PM
> | Subject: Re: TableOperations import directory
> | We are trying to run a unit test type scenario were we have a m-r
> | process that generates input to the bulk import process in a local
> | hadoop fs, and then copy the resulting output to a directory on the
> | dfs that can then be used as input to the
> | TableOperations.importDirectory() call.
> |
> | Is this a problem? Because the examples seem to work when we run them
> | with the -examples artifact in the lib dir but when that file is
> | removed and we try and run it in the same sort of way as the unit test
> | above it doesn't work.
> |
> | Is there some sort of requirement that the data being generated for
> | the import be going to the import directory of the bulk load process
> | _have_ to be on the dfs?
> |
> | In other words, is a bad assumption that I could take data from hadoop
> | dfs X and copy it over to hadoop dfs Y and then import it with the
> | importDirectory command?
> |
> | Does the job metadata or the job configuration play any role in the
> | bulk import process?
> |
> | cheers,
> | jesse
> |
> | --
> | jesse mcconnell
> | jesse.mcconnell@gmail.com
> |
>
> Everything in your process sounds correct. There is no data besides the
> file used for the bulk import process. So generating on one hdfs and
> transferring it over should pose no problems.
>
> Can you elaborate on the differences in regard to the examples artifact?
> The examples should have no effect on any regular aspects of the system. So
> if you could elaborate on how it doesn't work, that would be a good start.
> Like does it error, silently fail, etc.? One good place to look is the
> monitor page, as that will spit up any errors/warnings you get from the
> tservers, so that way if there's an error rising up from the rfiles your
> using, you should be able to see that so we can correct it.
>
> Also, glancing at your other emails, are you putting them in the Accumulo
> directory in hdfs when you move them to the true hdfs instance? I highly
> suggest you don't do that if you are. Let bulk import put them in the right
> place in the accumulo directory. It will just keep things simpler.
>

Re: TableOperations import directory

Posted by John W Vines <jo...@ugov.gov>.
----- Original Message -----
| From: "Jesse McConnell" <je...@gmail.com>
| To: accumulo-user@incubator.apache.org
| Sent: Monday, October 17, 2011 6:15:43 PM
| Subject: Re: TableOperations import directory
| We are trying to run a unit test type scenario were we have a m-r
| process that generates input to the bulk import process in a local
| hadoop fs, and then copy the resulting output to a directory on the
| dfs that can then be used as input to the
| TableOperations.importDirectory() call.
| 
| Is this a problem? Because the examples seem to work when we run them
| with the -examples artifact in the lib dir but when that file is
| removed and we try and run it in the same sort of way as the unit test
| above it doesn't work.
| 
| Is there some sort of requirement that the data being generated for
| the import be going to the import directory of the bulk load process
| _have_ to be on the dfs?
| 
| In other words, is a bad assumption that I could take data from hadoop
| dfs X and copy it over to hadoop dfs Y and then import it with the
| importDirectory command?
| 
| Does the job metadata or the job configuration play any role in the
| bulk import process?
| 
| cheers,
| jesse
| 
| --
| jesse mcconnell
| jesse.mcconnell@gmail.com
| 

Everything in your process sounds correct. There is no data besides the file used for the bulk import process. So generating on one hdfs and transferring it over should pose no problems.

Can you elaborate on the differences in regard to the examples artifact? The examples should have no effect on any regular aspects of the system. So if you could elaborate on how it doesn't work, that would be a good start. Like does it error, silently fail, etc.? One good place to look is the monitor page, as that will spit up any errors/warnings you get from the tservers, so that way if there's an error rising up from the rfiles your using, you should be able to see that so we can correct it.

Also, glancing at your other emails, are you putting them in the Accumulo directory in hdfs when you move them to the true hdfs instance? I highly suggest you don't do that if you are. Let bulk import put them in the right place in the accumulo directory. It will just keep things simpler.

Re: TableOperations import directory

Posted by Jesse McConnell <je...@gmail.com>.
We are trying to run a unit test type scenario were we have a m-r
process that generates input to the bulk import process in a local
hadoop fs, and then copy the resulting output to a directory on the
dfs that can then be used as input to the
TableOperations.importDirectory() call.

Is this a problem?  Because the examples seem to work when we run them
with the -examples artifact in the lib dir but when that file is
removed and we try and run it in the same sort of way as the unit test
above it doesn't work.

Is there some sort of requirement that the data being generated for
the import be going to the import directory of the bulk load process
_have_ to be on the dfs?

In other words, is a bad assumption that I could take data from hadoop
dfs X and copy it over to hadoop dfs Y and then import it with the
importDirectory command?

Does the job metadata or the job configuration play any role in the
bulk import process?

cheers,
jesse

--
jesse mcconnell
jesse.mcconnell@gmail.com



On Mon, Oct 17, 2011 at 12:44, Jesse McConnell
<je...@gmail.com> wrote:
> So its clear...
>
> [rune]/hadoop-0.20.2> ./bin/hadoop dfs -ls
> /*/tables/1k/bulk_ca71f2d0-be9a-4174-9ea2-d24cb509c7e6
> Found 2 items
> -rw-r--r--   3 jesse supergroup     543672 2011-10-17 12:40
> /*/tables/1k/bulk_ca71f2d0-be9a-4174-9ea2-d24cb509c7e6/00000_00000.rf
> -rw-r--r--   3 jesse supergroup         13 2011-10-17 12:40
> /*/tables/1k/bulk_ca71f2d0-be9a-4174-9ea2-d24cb509c7e6/processing_proc_1318873259833
>
> * replacing the version we are using atm
>
> That is what it looks like after the bulk prepare has copied things
> over to useful locations, and it doesn't seem to find any map files
> out of this bulkDir directory.
>
> cheers,
> jesse
>
> --
> jesse mcconnell
> jesse.mcconnell@gmail.com
>
>
>
> On Mon, Oct 17, 2011 at 12:30, Jesse McConnell
> <je...@gmail.com> wrote:
>> Ok, with that in mind, when stepping through the BulkImportLoader it
>> seems that after copying over it doesn't actually _find_ any files to
>> import, at least its not reporting meta file information on those
>> calls, and there is no files in the map file iterator..
>>
>> cheers,
>> jesse
>>
>> --
>> jesse mcconnell
>> jesse.mcconnell@gmail.com
>>
>>
>>
>> On Mon, Oct 17, 2011 at 12:28, Billie J Rinaldi
>> <bi...@ugov.gov> wrote:
>>> On Monday, October 17, 2011 1:20:41 PM, "Jesse McConnell" <je...@gmail.com> wrote:
>>>> What specifically is it looking for in terms of file to import from?
>>>> I know its been said that it should be an rfile, but does that mean it
>>>> should have a .rf extension?
>>>
>>> Yes, the files should have an .rf extension.
>>>
>>> Billie
>>>
>>
>

Re: TableOperations import directory

Posted by Jesse McConnell <je...@gmail.com>.
So its clear...

[rune]/hadoop-0.20.2> ./bin/hadoop dfs -ls
/*/tables/1k/bulk_ca71f2d0-be9a-4174-9ea2-d24cb509c7e6
Found 2 items
-rw-r--r--   3 jesse supergroup     543672 2011-10-17 12:40
/*/tables/1k/bulk_ca71f2d0-be9a-4174-9ea2-d24cb509c7e6/00000_00000.rf
-rw-r--r--   3 jesse supergroup         13 2011-10-17 12:40
/*/tables/1k/bulk_ca71f2d0-be9a-4174-9ea2-d24cb509c7e6/processing_proc_1318873259833

* replacing the version we are using atm

That is what it looks like after the bulk prepare has copied things
over to useful locations, and it doesn't seem to find any map files
out of this bulkDir directory.

cheers,
jesse

--
jesse mcconnell
jesse.mcconnell@gmail.com



On Mon, Oct 17, 2011 at 12:30, Jesse McConnell
<je...@gmail.com> wrote:
> Ok, with that in mind, when stepping through the BulkImportLoader it
> seems that after copying over it doesn't actually _find_ any files to
> import, at least its not reporting meta file information on those
> calls, and there is no files in the map file iterator..
>
> cheers,
> jesse
>
> --
> jesse mcconnell
> jesse.mcconnell@gmail.com
>
>
>
> On Mon, Oct 17, 2011 at 12:28, Billie J Rinaldi
> <bi...@ugov.gov> wrote:
>> On Monday, October 17, 2011 1:20:41 PM, "Jesse McConnell" <je...@gmail.com> wrote:
>>> What specifically is it looking for in terms of file to import from?
>>> I know its been said that it should be an rfile, but does that mean it
>>> should have a .rf extension?
>>
>> Yes, the files should have an .rf extension.
>>
>> Billie
>>
>

Re: TableOperations import directory

Posted by Jesse McConnell <je...@gmail.com>.
Ok, with that in mind, when stepping through the BulkImportLoader it
seems that after copying over it doesn't actually _find_ any files to
import, at least its not reporting meta file information on those
calls, and there is no files in the map file iterator..

cheers,
jesse

--
jesse mcconnell
jesse.mcconnell@gmail.com



On Mon, Oct 17, 2011 at 12:28, Billie J Rinaldi
<bi...@ugov.gov> wrote:
> On Monday, October 17, 2011 1:20:41 PM, "Jesse McConnell" <je...@gmail.com> wrote:
>> What specifically is it looking for in terms of file to import from?
>> I know its been said that it should be an rfile, but does that mean it
>> should have a .rf extension?
>
> Yes, the files should have an .rf extension.
>
> Billie
>

Re: TableOperations import directory

Posted by Billie J Rinaldi <bi...@ugov.gov>.
On Monday, October 17, 2011 1:20:41 PM, "Jesse McConnell" <je...@gmail.com> wrote:
> What specifically is it looking for in terms of file to import from?
> I know its been said that it should be an rfile, but does that mean it
> should have a .rf extension?

Yes, the files should have an .rf extension.

Billie

Re: TableOperations import directory

Posted by Jesse McConnell <je...@gmail.com>.
Debugging this thing it seems that its not actually finding anything
to import after it has copied stuff over.

What specifically is it looking for in terms of file to import from?
I know its been said that it should be an rfile, but does that mean it
should have a .rf extension?

cheers,
jesse

--
jesse mcconnell
jesse.mcconnell@gmail.com



On Mon, Oct 17, 2011 at 12:16, Jesse McConnell
<je...@gmail.com> wrote:
> Well, it seems like it is successful only there is no data, going to
> the web ui only shows no new data in the table.  So it _appears_ that
> it is failing silently on something.
>
> cheers,
> jesse
>
> --
> jesse mcconnell
> jesse.mcconnell@gmail.com
>
>
>
> On Mon, Oct 17, 2011 at 11:52, Billie J Rinaldi
> <bi...@ugov.gov> wrote:
>> On Monday, October 17, 2011 12:24:58 PM, "Jesse McConnell" <je...@gmail.com> wrote:
>>> ok, we have progress on the bulk import...but have hit another bit of
>>> a black hole. Once we started coping the m-r output from the local
>>> dfs we generated the rfiles to we now copy them over to the hdfs so
>>> the bulk load call can see them properly (we'll look at just m-r to
>>> the right dfs later). After debugging the client connection we then
>>> lead into the server side of the bulk import connection...which lead
>>> us to rename of the bulk import files to [0-9].map and now we see them
>>> getting picked up and copied over to the server side of the flow...but
>>> we can't seem to find any visibility on this Import Map Files section
>>> that goes too fast. Is there some debug we can turn on to see what
>>> its doing in there? Use the *FileOutputFormat to write the rfiles now
>>> since that seemed to be what the example was doing...
>>
>> What is happening at this point?  Does it look like the import is succeeding, but you can't see the data?
>>
>> Billie
>>
>

Re: TableOperations import directory

Posted by Billie J Rinaldi <bi...@ugov.gov>.
On Monday, October 17, 2011 1:16:49 PM, "Jesse McConnell" <je...@gmail.com> wrote:
> Well, it seems like it is successful only there is no data, going to
> the web ui only shows no new data in the table. So it _appears_ that
> it is failing silently on something.

The monitor page entry counts are only estimates, and at this point they do not include newly bulk imported data.  (This is so the tserver doesn't have to open every file to bring it online.)  You should be able to see the data when you do a scan, and when the files get compacted you'll see the updated counts on the monitor page.  You can trigger compaction by running "compact -t tablename" in the shell.

Billie

Re: TableOperations import directory

Posted by Jesse McConnell <je...@gmail.com>.
Well, it seems like it is successful only there is no data, going to
the web ui only shows no new data in the table.  So it _appears_ that
it is failing silently on something.

cheers,
jesse

--
jesse mcconnell
jesse.mcconnell@gmail.com



On Mon, Oct 17, 2011 at 11:52, Billie J Rinaldi
<bi...@ugov.gov> wrote:
> On Monday, October 17, 2011 12:24:58 PM, "Jesse McConnell" <je...@gmail.com> wrote:
>> ok, we have progress on the bulk import...but have hit another bit of
>> a black hole. Once we started coping the m-r output from the local
>> dfs we generated the rfiles to we now copy them over to the hdfs so
>> the bulk load call can see them properly (we'll look at just m-r to
>> the right dfs later). After debugging the client connection we then
>> lead into the server side of the bulk import connection...which lead
>> us to rename of the bulk import files to [0-9].map and now we see them
>> getting picked up and copied over to the server side of the flow...but
>> we can't seem to find any visibility on this Import Map Files section
>> that goes too fast. Is there some debug we can turn on to see what
>> its doing in there? Use the *FileOutputFormat to write the rfiles now
>> since that seemed to be what the example was doing...
>
> What is happening at this point?  Does it look like the import is succeeding, but you can't see the data?
>
> Billie
>

Re: TableOperations import directory

Posted by Billie J Rinaldi <bi...@ugov.gov>.
On Monday, October 17, 2011 12:24:58 PM, "Jesse McConnell" <je...@gmail.com> wrote:
> ok, we have progress on the bulk import...but have hit another bit of
> a black hole. Once we started coping the m-r output from the local
> dfs we generated the rfiles to we now copy them over to the hdfs so
> the bulk load call can see them properly (we'll look at just m-r to
> the right dfs later). After debugging the client connection we then
> lead into the server side of the bulk import connection...which lead
> us to rename of the bulk import files to [0-9].map and now we see them
> getting picked up and copied over to the server side of the flow...but
> we can't seem to find any visibility on this Import Map Files section
> that goes too fast. Is there some debug we can turn on to see what
> its doing in there? Use the *FileOutputFormat to write the rfiles now
> since that seemed to be what the example was doing...

What is happening at this point?  Does it look like the import is succeeding, but you can't see the data?

Billie

Re: TableOperations import directory

Posted by Jesse McConnell <je...@gmail.com>.
ok, we have progress on the bulk import...but have hit another bit of
a black hole.  Once we started coping the m-r output from the local
dfs we generated the rfiles to we now copy them over to the hdfs so
the bulk load call can see them properly (we'll look at just m-r to
the right dfs later).  After debugging the client connection we then
lead into the server side of the bulk import connection...which lead
us to rename of the bulk import files to [0-9].map and now we see them
getting picked up and copied over to the server side of the flow...but
we can't seem to find any visibility  on this Import Map Files section
that goes too fast.  Is there some debug we can turn on to see what
its doing in there?  Use the *FileOutputFormat to write the rfiles now
since that seemed to be what the example was doing...

 Estimated map files sizes in   0.00 secs
 BULK IMPORT TIMING STATISTICS
 Move map files       :       0.06 secs  33.33%
 Examine map files    :       0.01 secs   4.52%
 Query !METADATA      :       0.08 secs  43.50%
 Import Map Files     :       0.00 secs   0.00%
 Sleep                :       0.00 secs   0.00%
 Misc                 :       0.03 secs  18.64%
 Total                :       0.18 secs

Any tips on where we can look to sort out what might be wrong with
these map files that are getting copied over?

cheers,
jesse

--
jesse mcconnell
jesse.mcconnell@gmail.com



On Fri, Oct 14, 2011 at 16:51, Jesse McConnell
<je...@gmail.com> wrote:
> Oh I wish...the logs are either very verbose with information
> unrelated to the issue...or eerily silent :)
>
> Is there a particular file that might have more info related to this
> then another?
>
> Also, is there a _strict_ format of the data in the file being bulk imported?
>
> cheers,
> jesse
>
> --
> jesse mcconnell
> jesse.mcconnell@gmail.com
>
>
>
> On Fri, Oct 14, 2011 at 16:36, Billie J Rinaldi
> <bi...@ugov.gov> wrote:
>> On Friday, October 14, 2011 4:27:55 PM, "Jesse McConnell" <je...@gmail.com> wrote:
>>> What is the format of the files in the directory that you are supposed
>>> to be importing from?
>>>
>>> I am currently getting a lovely 'unknown result' error message out of
>>> the ClientService.
>>
>> Also, there may be more informative errors in the logs, and if so, they are probably being forwarded to your monitor page.  Are there any relevant-looking errors under "Log Events" on the monitor page?
>>
>> Billie
>>
>

Re: TableOperations import directory

Posted by Jesse McConnell <je...@gmail.com>.
Oh I wish...the logs are either very verbose with information
unrelated to the issue...or eerily silent :)

Is there a particular file that might have more info related to this
then another?

Also, is there a _strict_ format of the data in the file being bulk imported?

cheers,
jesse

--
jesse mcconnell
jesse.mcconnell@gmail.com



On Fri, Oct 14, 2011 at 16:36, Billie J Rinaldi
<bi...@ugov.gov> wrote:
> On Friday, October 14, 2011 4:27:55 PM, "Jesse McConnell" <je...@gmail.com> wrote:
>> What is the format of the files in the directory that you are supposed
>> to be importing from?
>>
>> I am currently getting a lovely 'unknown result' error message out of
>> the ClientService.
>
> Also, there may be more informative errors in the logs, and if so, they are probably being forwarded to your monitor page.  Are there any relevant-looking errors under "Log Events" on the monitor page?
>
> Billie
>

Re: TableOperations import directory

Posted by Billie J Rinaldi <bi...@ugov.gov>.
On Friday, October 14, 2011 4:27:55 PM, "Jesse McConnell" <je...@gmail.com> wrote:
> What is the format of the files in the directory that you are supposed
> to be importing from?
>
> I am currently getting a lovely 'unknown result' error message out of
> the ClientService.

Also, there may be more informative errors in the logs, and if so, they are probably being forwarded to your monitor page.  Are there any relevant-looking errors under "Log Events" on the monitor page?

Billie

Re: TableOperations import directory

Posted by Billie J Rinaldi <bi...@ugov.gov>.
On Friday, October 14, 2011 4:27:55 PM, "Jesse McConnell" <je...@gmail.com> wrote:
> What is the format of the files in the directory that you are supposed
> to be importing from?
> 
> I am currently getting a lovely 'unknown result' error message out of
> the ClientService.

RFiles are definitely the right format.  This isn't my area of expertise, but I am wondering if it is some kind of permission problem, like hdfs permissions on the files or table permissions for the accumulo user.

Billie