You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Tao Xiao <xi...@gmail.com> on 2013/12/16 11:47:58 UTC

Why so many unexpected files like partitions_xxxx are created?

I imported data into HBase in the fashion of bulk load,  but after that I
found many unexpected file were created in the HDFS directory of
/user/root/, and they like these:

/user/root/partitions_fd74866b-6588-468d-8463-474e202db070
/user/root/partitions_fd867cd2-d9c9-48f5-9eec-185b2e57788d
/user/root/partitions_fda37b8a-a882-4787-babc-8310a969f85c
/user/root/partitions_fdaca2f4-2792-41f6-b7e8-61a8a5677dea
/user/root/partitions_fdd55baa-3a12-493e-8844-a23ae83209c5
/user/root/partitions_fdd85a3c-9abe-45d4-a0c6-76d2bed88ea5
/user/root/partitions_fe133460-5f3f-4c6a-9fff-ff6c62410cc1
/user/root/partitions_fe29a2b0-b281-465f-8d4a-6044822d960a
/user/root/partitions_fe2fa6fa-9066-484c-bc91-ec412e48d008
/user/root/partitions_fe31667b-2d5a-452e-baf7-a81982fe954a
/user/root/partitions_fe3a5542-bc4d-4137-9d5e-1a0c59f72ac3
/user/root/partitions_fe6a9407-c27b-4a67-bb50-e6b9fd172bc9
/user/root/partitions_fe6f9294-f970-473c-8659-c08292c27ddd
... ...
... ...


It seems that they are HFiles, but I don't know why the were created here?

I bulk load data into HBase in the following way:

Firstly,   I wrote a MapReduce program which only has map tasks. The map
tasks read some text data and emit them in the form of  RowKey and
KeyValue.The following is my program:

        @Override
        protected void map(NullWritable NULL, GtpcV1SignalWritable signal,
Context ctx) throws InterruptedException, IOException {
            String strRowkey = xxx;
            byte[] rowkeyBytes = Bytes.toBytes(strRowkey);

            rowkey.set(rowkeyBytes);

            part1.init(signal);
            part2.init(signal);

            KeyValue kv = new KeyValue(rowkeyBytes, Family_A, Qualifier_Q,
part1.serialize());
            ctx.write(rowkey, kv);

            kv = new KeyValue(rowkeyBytes, Family_B, Qualifier_Q,
part2.serialize());
            ctx.write(rowkey, kv);
        }


after the MR programs finished, there were several HFiles generated in the
output directory I specified.

Then I bean to load these HFiles into HBase using the following command:
       hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
HFiles-Dir  MyTable

Finally , I could see that the data were indeed loaded into the table in
HBase.


But, I could also see that there were many unexpected files generated in
the HDFS directory of  /user/root/,  just as I have mentioned at the
begining of this mail,  and I did not specify any files to be produced in
this directory.

What happened ? Who can tell me what there files are and who produced them?

Thanks

Re: Why so many unexpected files like partitions_xxxx are created?

Posted by Tao Xiao <xi...@gmail.com>.

My HBase version is 0.94.12


2013/12/20 Tao Xiao <xi...@gmail.com>

> Hi Ted，
>      You let me check the log of LoadIncrementalHFiles to see what was the
> error from region server, but where is the log of LoadIncrementalHFiles? Is
> it in written into the log of region server? It seems the region server
> works well
>
>
>
>
> 2013/12/19 Ted Yu <yu...@gmail.com>
>
>> From the stack trace posted I saw:
>>
>> org.apache.commons.logging.impl.Log4JLogger.error(Log4JLogger.java:257)
>>     at
>>
>> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.tryAtomicRegionLoad(
>> LoadIncrementalHFiles.java:577)
>>
>> Assuming 0.94 is used, line 577 at the tip of 0.94 is:
>>         LOG.warn("Attempt to bulk load region containing "
>>             + Bytes.toStringBinary(first) + " into table "
>>
>> But the following should be the corresponding line w.r.t. stack trace:
>>     } catch (IOException e) {
>>       LOG.error("Encountered unrecoverable error from region server", e);
>>
>> Tao:
>> Can you check the log of LoadIncrementalHFiles to see what was the error
>> from region server ?
>>
>> As Jieshan said, checking region server log would reveal something.
>>
>> Cheers
>>
>>
>> On Tue, Dec 17, 2013 at 10:40 PM, Bijieshan <bi...@huawei.com> wrote:
>>
>> > It seems LoadIncrementalHFiles is still running.  Can you run "jstack"
>> on
>> > 1 RegionServer process also?
>> >
>> > Which version are you using?
>> >
>> > Jieshan.
>> > -----Original Message-----
>> > From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com]
>> > Sent: Wednesday, December 18, 2013 1:49 PM
>> > To: user@hbase.apache.org
>> > Subject: Re: Why so many unexpected files like partitions_xxxx are
>> created?
>> >
>> > I did jstack one such process and can see the following output in the
>> > terminal, and I guess this info told us that the processes started by
>> the
>> > command "LoadIncrementalHFiles" never exit. Why didn't they exit after
>> > finished running ?
>> >
>> > ... ...
>> > ... ...
>> >
>> > "LoadIncrementalHFiles-0.LruBlockCache.EvictionThread" daemon prio=10
>> > tid=0x000000004129c000 nid=0x2186 in Object.wait() [0x00007f53f3665000]
>> >    java.lang.Thread.State: WAITING (on object monitor)
>> >     at java.lang.Object.wait(Native Method)
>> >     - waiting on <0x000000075fcf3370> (a
>> > org.apache.hadoop.hbase.io.hfile.LruBlockCache$EvictionThread)
>> >     at java.lang.Object.wait(Object.java:485)
>> >     at
>> >
>> >
>> org.apache.hadoop.hbase.io.hfile.LruBlockCache$EvictionThread.run(LruBlockCache.java:631)
>> >     - locked <0x000000075fcf3370> (a
>> > org.apache.hadoop.hbase.io.hfile.LruBlockCache$EvictionThread)
>> >     at java.lang.Thread.run(Thread.java:662)
>> >
>> >    Locked ownable synchronizers:
>> >     - None
>> >
>> > "LoadIncrementalHFiles-3" prio=10 tid=0x00007f540ca55800 nid=0x2185
>> > runnable [0x00007f53f3765000]
>> >    java.lang.Thread.State: RUNNABLE
>> >     at java.io.FileOutputStream.writeBytes(Native Method)
>> >     at java.io.FileOutputStream.write(FileOutputStream.java:282)
>> >     at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
>> >     - locked <0x0000000763e5af70> (a java.io.BufferedOutputStream)
>> >     at java.io.PrintStream.write(PrintStream.java:430)
>> >     - locked <0x0000000763d5b670> (a java.io.PrintStream)
>> >     at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)
>> >     at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:263)
>> >     at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:106)
>> >     - locked <0x0000000763d6c6d0> (a java.io.OutputStreamWriter)
>> >     at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:116)
>> >     at java.io.OutputStreamWriter.write(OutputStreamWriter.java:203)
>> >     at java.io.Writer.write(Writer.java:140)
>> >     at org.apache.log4j.helpers.QuietWriter.write(QuietWriter.java:48)
>> >     at
>> org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:317)
>> >     at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
>> >     at
>> > org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
>> >     - locked <0x0000000763d5fb90> (a org.apache.log4j.ConsoleAppender)
>> >     at
>> >
>> >
>> org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
>> >     at org.apache.log4j.Category.callAppenders(Category.java:206)
>> >     - locked <0x0000000763d65fe8> (a org.apache.log4j.spi.RootLogger)
>> >     at org.apache.log4j.Category.forcedLog(Category.java:391)
>> >     at org.apache.log4j.Category.log(Category.java:856)
>> >     at
>> > org.apache.commons.logging.impl.Log4JLogger.error(Log4JLogger.java:257)
>> >     at
>> >
>> >
>> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.tryAtomicRegionLoad(LoadIncrementalHFiles.java:577)
>> >     at
>> >
>> >
>> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$1.call(LoadIncrementalHFiles.java:316)
>> >     at
>> >
>> >
>> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$1.call(LoadIncrementalHFiles.java:314)
>> >     at
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> >     at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> >     at
>> >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>> >     at
>> >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>> >     at java.lang.Thread.run(Thread.java:662)
>> >
>> >    Locked ownable synchronizers:
>> >     - <0x000000075fe494c0> (a
>> > java.util.concurrent.locks.ReentrantLock$NonfairSync)
>> >
>> > ... ...
>> > ... ...
>> >
>> > "Reference Handler" daemon prio=10 tid=0x00007f540c138800 nid=0x2172 in
>> > Object.wait() [0x00007f5401355000]
>> >    java.lang.Thread.State: WAITING (on object monitor)
>> >     at java.lang.Object.wait(Native Method)
>> >     - waiting on <0x0000000763d51078> (a java.lang.ref.Reference$Lock)
>> >     at java.lang.Object.wait(Object.java:485)
>> >     at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
>> >     - locked <0x0000000763d51078> (a java.lang.ref.Reference$Lock)
>> >
>> >    Locked ownable synchronizers:
>> >     - None
>> >
>> > "main" prio=10 tid=0x00007f540c00e000 nid=0x216a waiting on condition
>> > [0x00007f54114ac000]
>> >    java.lang.Thread.State: WAITING (parking)
>> >     at sun.misc.Unsafe.park(Native Method)
>> >     - parking to wait for  <0x000000075ea67310> (a
>> > java.util.concurrent.FutureTask$Sync)
>> >     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
>> >     at
>> >
>> >
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
>> >     at
>> >
>> >
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
>> >     at
>> >
>> >
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
>> >     at
>> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
>> >     at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>> >     at
>> >
>> >
>> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.bulkLoadPhase(LoadIncrementalHFiles.java:326)
>> >     at
>> >
>> >
>> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:261)
>> >     at
>> >
>> >
>> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:780)
>> >     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> >     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>> >     at
>> >
>> >
>> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:785)
>> >
>> >    Locked ownable synchronizers:
>> >     - None
>> >
>> > "VM Thread" prio=10 tid=0x00007f540c132000 nid=0x2170 runnable
>> >
>> > "Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x00007f540c01c800
>> > nid=0x216b runnable
>> >
>> > "Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x00007f540c01e800
>> > nid=0x216c runnable
>> >
>> > "Gang worker#2 (Parallel GC Threads)" prio=10 tid=0x00007f540c020000
>> > nid=0x216d runnable
>> >
>> > "Gang worker#3 (Parallel GC Threads)" prio=10 tid=0x00007f540c022000
>> > nid=0x216e runnable
>> >
>> > "Concurrent Mark-Sweep GC Thread" prio=10 tid=0x00007f540c0b1000
>> > nid=0x216f runnable "VM Periodic Task Thread" prio=10
>> > tid=0x00007f540c16b000 nid=0x217a waiting on condition
>> >
>> > JNI global references: 1118
>> >
>> >
>> > 2013/12/18 Ted Yu <yu...@gmail.com>
>> >
>> > > Tao:
>> > > Can you jstack one such process next time you see them hanging ?
>> > >
>> > > Thanks
>> > >
>> > >
>> > > On Tue, Dec 17, 2013 at 6:31 PM, Tao Xiao <xi...@gmail.com>
>> > > wrote:
>> > >
>> > > > BTW, I noticed another problem. I bulk load data into HBase every
>> > > > five minutes, but I found that whenever the following command was
>> > executed
>> > > >     hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
>> > > > HFiles-Dir  MyTable
>> > > >
>> > > > there is a new process called "LoadIncrementalHFiles"
>> > > >
>> > > > I can see many processes called "LoadIncrementalHFiles" using the
>> > > > command "jps" in the terminal， why are these processes still there
>> > > > even after the command that bulk load HFiles into HBase has finished
>> > > > executing ? I have
>> > > to
>> > > > kill them myself.
>> > > >
>> > > >
>> > > > 2013/12/17 Bijieshan <bi...@huawei.com>
>> > > >
>> > > > > Yes, it should be cleaned up. But not included in current code in
>> > > > > my understanding.
>> > > > >
>> > > > > Jieshan.
>> > > > > -----Original Message-----
>> > > > > From: Ted Yu [mailto:yuzhihong@gmail.com]
>> > > > > Sent: Tuesday, December 17, 2013 10:55 AM
>> > > > > To: user@hbase.apache.org
>> > > > > Subject: Re: Why so many unexpected files like partitions_xxxx are
>> > > > created?
>> > > > >
>> > > > > Should bulk load task clean up partitions_xxxx upon completion ?
>> > > > >
>> > > > > Cheers
>> > > > >
>> > > > >
>> > > > > On Mon, Dec 16, 2013 at 6:53 PM, Bijieshan <bi...@huawei.com>
>> > > wrote:
>> > > > >
>> > > > > > >  I think I should delete these files immediately after I have
>> > > > > > > finished
>> > > > > > bulk loading data into HBase since they are useless at that
>> > > > > > time,
>> > > > right ?
>> > > > > >
>> > > > > > Ya. I think so. They are useless once bulk load task finished.
>> > > > > >
>> > > > > > Jieshan.
>> > > > > > -----Original Message-----
>> > > > > > From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com]
>> > > > > > Sent: Tuesday, December 17, 2013 9:34 AM
>> > > > > > To: user@hbase.apache.org
>> > > > > > Subject: Re: Why so many unexpected files like partitions_xxxx
>> > > > > > are
>> > > > > created?
>> > > > > >
>> > > > > > Indeed these files are produced by
>> > org.apache.hadoop.hbase.mapreduce.
>> > > > > > LoadIncrementalHFiles in the directory specified by what
>> > > > > > job.getWorkingDirectory()
>> > > > > > returns, and I think I should delete these files immediately
>> > > > > > after I have finished bulk loading data into HBase since they
>> > > > > > are useless at that time, right ?
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > 2013/12/16 Bijieshan <bi...@huawei.com>
>> > > > > >
>> > > > > > > The reduce partition information is stored in this
>> > > > > > > partition_XXXX
>> > > > file.
>> > > > > > > See the below code:
>> > > > > > >
>> > > > > > > HFileOutputFormat#configureIncrementalLoad:
>> > > > > > >         .....................
>> > > > > > >     Path partitionsPath = new Path(job.getWorkingDirectory(),
>> > > > > > >                                    "partitions_" +
>> > > > UUID.randomUUID());
>> > > > > > >     LOG.info("Writing partition information to " +
>> > > > > > > partitionsPath);
>> > > > > > >
>> > > > > > >     FileSystem fs = partitionsPath.getFileSystem(conf);
>> > > > > > >     writePartitions(conf, partitionsPath, startKeys);
>> > > > > > >         .....................
>> > > > > > >
>> > > > > > > Hoping it helps.
>> > > > > > >
>> > > > > > > Jieshan
>> > > > > > > -----Original Message-----
>> > > > > > > From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com]
>> > > > > > > Sent: Monday, December 16, 2013 6:48 PM
>> > > > > > > To: user@hbase.apache.org
>> > > > > > > Subject: Why so many unexpected files like partitions_xxxx are
>> > > > created?
>> > > > > > >
>> > > > > > > I imported data into HBase in the fashion of bulk load,  but
>> > > > > > > after that I found many unexpected file were created in the
>> > > > > > > HDFS
>> > > directory
>> > > > > > > of /user/root/, and they like these:
>> > > > > > >
>> > > > > > > /user/root/partitions_fd74866b-6588-468d-8463-474e202db070
>> > > > > > > /user/root/partitions_fd867cd2-d9c9-48f5-9eec-185b2e57788d
>> > > > > > > /user/root/partitions_fda37b8a-a882-4787-babc-8310a969f85c
>> > > > > > > /user/root/partitions_fdaca2f4-2792-41f6-b7e8-61a8a5677dea
>> > > > > > > /user/root/partitions_fdd55baa-3a12-493e-8844-a23ae83209c5
>> > > > > > > /user/root/partitions_fdd85a3c-9abe-45d4-a0c6-76d2bed88ea5
>> > > > > > > /user/root/partitions_fe133460-5f3f-4c6a-9fff-ff6c62410cc1
>> > > > > > > /user/root/partitions_fe29a2b0-b281-465f-8d4a-6044822d960a
>> > > > > > > /user/root/partitions_fe2fa6fa-9066-484c-bc91-ec412e48d008
>> > > > > > > /user/root/partitions_fe31667b-2d5a-452e-baf7-a81982fe954a
>> > > > > > > /user/root/partitions_fe3a5542-bc4d-4137-9d5e-1a0c59f72ac3
>> > > > > > > /user/root/partitions_fe6a9407-c27b-4a67-bb50-e6b9fd172bc9
>> > > > > > > /user/root/partitions_fe6f9294-f970-473c-8659-c08292c27ddd
>> > > > > > > ... ...
>> > > > > > > ... ...
>> > > > > > >
>> > > > > > >
>> > > > > > > It seems that they are HFiles, but I don't know why the were
>> > > created
>> > > > > > here?
>> > > > > > >
>> > > > > > > I bulk load data into HBase in the following way:
>> > > > > > >
>> > > > > > > Firstly,   I wrote a MapReduce program which only has map
>> tasks.
>> > > The
>> > > > > map
>> > > > > > > tasks read some text data and emit them in the form of  RowKey
>> > > > > > > and KeyValue.The following is my program:
>> > > > > > >
>> > > > > > >         @Override
>> > > > > > >         protected void map(NullWritable NULL,
>> > > > > > > GtpcV1SignalWritable signal, Context ctx) throws
>> > InterruptedException, IOException {
>> > > > > > >             String strRowkey = xxx;
>> > > > > > >             byte[] rowkeyBytes = Bytes.toBytes(strRowkey);
>> > > > > > >
>> > > > > > >             rowkey.set(rowkeyBytes);
>> > > > > > >
>> > > > > > >             part1.init(signal);
>> > > > > > >             part2.init(signal);
>> > > > > > >
>> > > > > > >             KeyValue kv = new KeyValue(rowkeyBytes, Family_A,
>> > > > > > > Qualifier_Q, part1.serialize());
>> > > > > > >             ctx.write(rowkey, kv);
>> > > > > > >
>> > > > > > >             kv = new KeyValue(rowkeyBytes, Family_B,
>> > > > > > > Qualifier_Q, part2.serialize());
>> > > > > > >             ctx.write(rowkey, kv);
>> > > > > > >         }
>> > > > > > >
>> > > > > > >
>> > > > > > > after the MR programs finished, there were several HFiles
>> > > > > > > generated in the output directory I specified.
>> > > > > > >
>> > > > > > > Then I bean to load these HFiles into HBase using the
>> > > > > > > following
>> > > > > command:
>> > > > > > >        hbase
>> > > org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
>> > > > > > > HFiles-Dir  MyTable
>> > > > > > >
>> > > > > > > Finally , I could see that the data were indeed loaded into
>> > > > > > > the table in HBase.
>> > > > > > >
>> > > > > > >
>> > > > > > > But, I could also see that there were many unexpected files
>> > > > > > > generated in the HDFS directory of  /user/root/,  just as I
>> > > > > > > have mentioned at the begining of this mail,  and I did not
>> > > > > > > specify any files to be produced in this directory.
>> > > > > > >
>> > > > > > > What happened ? Who can tell me what there files are and who
>> > > > > > > produced
>> > > > > > them?
>> > > > > > >
>> > > > > > > Thanks
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Why so many unexpected files like partitions_xxxx are created?

Posted by Tao Xiao <xi...@gmail.com>.

Hi , the cluster was migrated to another city so I have to wait for such a
long time to check the log. The reason why the process "LoadIncrementalHFiles"
never exit is that I deleted an HFile by mistake. After I adjusted my code
that process can exit soon after the data was loaded into HBase.


2013/12/20 Ted Yu <yu...@gmail.com>

>     } catch (IOException e) {
>       LOG.error("Encountered unrecoverable error from region server", e);
> The IOException came from region server. During execution of
> LoadIncrementalHFiles, there would be some error on region server.
>
> Cheers
>
>
> On Thu, Dec 19, 2013 at 10:18 PM, Tao Xiao <xi...@gmail.com>
> wrote:
>
> > Hi Ted，
> >      You let me check the log of LoadIncrementalHFiles to see what was
> the
> > error from region server, but where is the log of LoadIncrementalHFiles?
> Is
> > it in written into the log of region server? It seems the region server
> > works well
> >
> >
> >
> >
> > 2013/12/19 Ted Yu <yu...@gmail.com>
> >
> > > From the stack trace posted I saw:
> > >
> > > org.apache.commons.logging.impl.Log4JLogger.error(Log4JLogger.java:257)
> > >     at
> > >
> > >
> >
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.tryAtomicRegionLoad(
> > > LoadIncrementalHFiles.java:577)
> > >
> > > Assuming 0.94 is used, line 577 at the tip of 0.94 is:
> > >         LOG.warn("Attempt to bulk load region containing "
> > >             + Bytes.toStringBinary(first) + " into table "
> > >
> > > But the following should be the corresponding line w.r.t. stack trace:
> > >     } catch (IOException e) {
> > >       LOG.error("Encountered unrecoverable error from region server",
> e);
> > >
> > > Tao:
> > > Can you check the log of LoadIncrementalHFiles to see what was the
> error
> > > from region server ?
> > >
> > > As Jieshan said, checking region server log would reveal something.
> > >
> > > Cheers
> > >
> > >
> > > On Tue, Dec 17, 2013 at 10:40 PM, Bijieshan <bi...@huawei.com>
> > wrote:
> > >
> > > > It seems LoadIncrementalHFiles is still running.  Can you run
> "jstack"
> > on
> > > > 1 RegionServer process also?
> > > >
> > > > Which version are you using?
> > > >
> > > > Jieshan.
> > > > -----Original Message-----
> > > > From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com]
> > > > Sent: Wednesday, December 18, 2013 1:49 PM
> > > > To: user@hbase.apache.org
> > > > Subject: Re: Why so many unexpected files like partitions_xxxx are
> > > created?
> > > >
> > > > I did jstack one such process and can see the following output in the
> > > > terminal, and I guess this info told us that the processes started by
> > the
> > > > command "LoadIncrementalHFiles" never exit. Why didn't they exit
> after
> > > > finished running ?
> > > >
> > > > ... ...
> > > > ... ...
> > > >
> > > > "LoadIncrementalHFiles-0.LruBlockCache.EvictionThread" daemon prio=10
> > > > tid=0x000000004129c000 nid=0x2186 in Object.wait()
> [0x00007f53f3665000]
> > > >    java.lang.Thread.State: WAITING (on object monitor)
> > > >     at java.lang.Object.wait(Native Method)
> > > >     - waiting on <0x000000075fcf3370> (a
> > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache$EvictionThread)
> > > >     at java.lang.Object.wait(Object.java:485)
> > > >     at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.io.hfile.LruBlockCache$EvictionThread.run(LruBlockCache.java:631)
> > > >     - locked <0x000000075fcf3370> (a
> > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache$EvictionThread)
> > > >     at java.lang.Thread.run(Thread.java:662)
> > > >
> > > >    Locked ownable synchronizers:
> > > >     - None
> > > >
> > > > "LoadIncrementalHFiles-3" prio=10 tid=0x00007f540ca55800 nid=0x2185
> > > > runnable [0x00007f53f3765000]
> > > >    java.lang.Thread.State: RUNNABLE
> > > >     at java.io.FileOutputStream.writeBytes(Native Method)
> > > >     at java.io.FileOutputStream.write(FileOutputStream.java:282)
> > > >     at
> > java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
> > > >     - locked <0x0000000763e5af70> (a java.io.BufferedOutputStream)
> > > >     at java.io.PrintStream.write(PrintStream.java:430)
> > > >     - locked <0x0000000763d5b670> (a java.io.PrintStream)
> > > >     at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)
> > > >     at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:263)
> > > >     at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:106)
> > > >     - locked <0x0000000763d6c6d0> (a java.io.OutputStreamWriter)
> > > >     at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:116)
> > > >     at java.io.OutputStreamWriter.write(OutputStreamWriter.java:203)
> > > >     at java.io.Writer.write(Writer.java:140)
> > > >     at
> org.apache.log4j.helpers.QuietWriter.write(QuietWriter.java:48)
> > > >     at
> > org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:317)
> > > >     at
> org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
> > > >     at
> > > > org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
> > > >     - locked <0x0000000763d5fb90> (a
> org.apache.log4j.ConsoleAppender)
> > > >     at
> > > >
> > > >
> > >
> >
> org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
> > > >     at org.apache.log4j.Category.callAppenders(Category.java:206)
> > > >     - locked <0x0000000763d65fe8> (a org.apache.log4j.spi.RootLogger)
> > > >     at org.apache.log4j.Category.forcedLog(Category.java:391)
> > > >     at org.apache.log4j.Category.log(Category.java:856)
> > > >     at
> > > >
> org.apache.commons.logging.impl.Log4JLogger.error(Log4JLogger.java:257)
> > > >     at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.tryAtomicRegionLoad(LoadIncrementalHFiles.java:577)
> > > >     at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$1.call(LoadIncrementalHFiles.java:316)
> > > >     at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$1.call(LoadIncrementalHFiles.java:314)
> > > >     at
> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > > >     at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > > >     at
> > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> > > >     at
> > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> > > >     at java.lang.Thread.run(Thread.java:662)
> > > >
> > > >    Locked ownable synchronizers:
> > > >     - <0x000000075fe494c0> (a
> > > > java.util.concurrent.locks.ReentrantLock$NonfairSync)
> > > >
> > > > ... ...
> > > > ... ...
> > > >
> > > > "Reference Handler" daemon prio=10 tid=0x00007f540c138800 nid=0x2172
> in
> > > > Object.wait() [0x00007f5401355000]
> > > >    java.lang.Thread.State: WAITING (on object monitor)
> > > >     at java.lang.Object.wait(Native Method)
> > > >     - waiting on <0x0000000763d51078> (a
> java.lang.ref.Reference$Lock)
> > > >     at java.lang.Object.wait(Object.java:485)
> > > >     at
> java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
> > > >     - locked <0x0000000763d51078> (a java.lang.ref.Reference$Lock)
> > > >
> > > >    Locked ownable synchronizers:
> > > >     - None
> > > >
> > > > "main" prio=10 tid=0x00007f540c00e000 nid=0x216a waiting on condition
> > > > [0x00007f54114ac000]
> > > >    java.lang.Thread.State: WAITING (parking)
> > > >     at sun.misc.Unsafe.park(Native Method)
> > > >     - parking to wait for  <0x000000075ea67310> (a
> > > > java.util.concurrent.FutureTask$Sync)
> > > >     at
> > java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
> > > >     at
> > > >
> > > >
> > >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
> > > >     at
> > > >
> > > >
> > >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
> > > >     at
> > > >
> > > >
> > >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
> > > >     at
> > java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
> > > >     at java.util.concurrent.FutureTask.get(FutureTask.java:83)
> > > >     at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.bulkLoadPhase(LoadIncrementalHFiles.java:326)
> > > >     at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:261)
> > > >     at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:780)
> > > >     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> > > >     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> > > >     at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:785)
> > > >
> > > >    Locked ownable synchronizers:
> > > >     - None
> > > >
> > > > "VM Thread" prio=10 tid=0x00007f540c132000 nid=0x2170 runnable
> > > >
> > > > "Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x00007f540c01c800
> > > > nid=0x216b runnable
> > > >
> > > > "Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x00007f540c01e800
> > > > nid=0x216c runnable
> > > >
> > > > "Gang worker#2 (Parallel GC Threads)" prio=10 tid=0x00007f540c020000
> > > > nid=0x216d runnable
> > > >
> > > > "Gang worker#3 (Parallel GC Threads)" prio=10 tid=0x00007f540c022000
> > > > nid=0x216e runnable
> > > >
> > > > "Concurrent Mark-Sweep GC Thread" prio=10 tid=0x00007f540c0b1000
> > > > nid=0x216f runnable "VM Periodic Task Thread" prio=10
> > > > tid=0x00007f540c16b000 nid=0x217a waiting on condition
> > > >
> > > > JNI global references: 1118
> > > >
> > > >
> > > > 2013/12/18 Ted Yu <yu...@gmail.com>
> > > >
> > > > > Tao:
> > > > > Can you jstack one such process next time you see them hanging ?
> > > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > > On Tue, Dec 17, 2013 at 6:31 PM, Tao Xiao <
> xiaotao.cs.nju@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > BTW, I noticed another problem. I bulk load data into HBase every
> > > > > > five minutes, but I found that whenever the following command was
> > > > executed
> > > > > >     hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
> > > > > > HFiles-Dir  MyTable
> > > > > >
> > > > > > there is a new process called "LoadIncrementalHFiles"
> > > > > >
> > > > > > I can see many processes called "LoadIncrementalHFiles" using the
> > > > > > command "jps" in the terminal， why are these processes still
> there
> > > > > > even after the command that bulk load HFiles into HBase has
> > finished
> > > > > > executing ? I have
> > > > > to
> > > > > > kill them myself.
> > > > > >
> > > > > >
> > > > > > 2013/12/17 Bijieshan <bi...@huawei.com>
> > > > > >
> > > > > > > Yes, it should be cleaned up. But not included in current code
> in
> > > > > > > my understanding.
> > > > > > >
> > > > > > > Jieshan.
> > > > > > > -----Original Message-----
> > > > > > > From: Ted Yu [mailto:yuzhihong@gmail.com]
> > > > > > > Sent: Tuesday, December 17, 2013 10:55 AM
> > > > > > > To: user@hbase.apache.org
> > > > > > > Subject: Re: Why so many unexpected files like partitions_xxxx
> > are
> > > > > > created?
> > > > > > >
> > > > > > > Should bulk load task clean up partitions_xxxx upon completion
> ?
> > > > > > >
> > > > > > > Cheers
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Dec 16, 2013 at 6:53 PM, Bijieshan <
> bijieshan@huawei.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > >  I think I should delete these files immediately after I
> have
> > > > > > > > > finished
> > > > > > > > bulk loading data into HBase since they are useless at that
> > > > > > > > time,
> > > > > > right ?
> > > > > > > >
> > > > > > > > Ya. I think so. They are useless once bulk load task
> finished.
> > > > > > > >
> > > > > > > > Jieshan.
> > > > > > > > -----Original Message-----
> > > > > > > > From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com]
> > > > > > > > Sent: Tuesday, December 17, 2013 9:34 AM
> > > > > > > > To: user@hbase.apache.org
> > > > > > > > Subject: Re: Why so many unexpected files like
> partitions_xxxx
> > > > > > > > are
> > > > > > > created?
> > > > > > > >
> > > > > > > > Indeed these files are produced by
> > > > org.apache.hadoop.hbase.mapreduce.
> > > > > > > > LoadIncrementalHFiles in the directory specified by what
> > > > > > > > job.getWorkingDirectory()
> > > > > > > > returns, and I think I should delete these files immediately
> > > > > > > > after I have finished bulk loading data into HBase since they
> > > > > > > > are useless at that time, right ?
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > 2013/12/16 Bijieshan <bi...@huawei.com>
> > > > > > > >
> > > > > > > > > The reduce partition information is stored in this
> > > > > > > > > partition_XXXX
> > > > > > file.
> > > > > > > > > See the below code:
> > > > > > > > >
> > > > > > > > > HFileOutputFormat#configureIncrementalLoad:
> > > > > > > > >         .....................
> > > > > > > > >     Path partitionsPath = new
> Path(job.getWorkingDirectory(),
> > > > > > > > >                                    "partitions_" +
> > > > > > UUID.randomUUID());
> > > > > > > > >     LOG.info("Writing partition information to " +
> > > > > > > > > partitionsPath);
> > > > > > > > >
> > > > > > > > >     FileSystem fs = partitionsPath.getFileSystem(conf);
> > > > > > > > >     writePartitions(conf, partitionsPath, startKeys);
> > > > > > > > >         .....................
> > > > > > > > >
> > > > > > > > > Hoping it helps.
> > > > > > > > >
> > > > > > > > > Jieshan
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com]
> > > > > > > > > Sent: Monday, December 16, 2013 6:48 PM
> > > > > > > > > To: user@hbase.apache.org
> > > > > > > > > Subject: Why so many unexpected files like partitions_xxxx
> > are
> > > > > > created?
> > > > > > > > >
> > > > > > > > > I imported data into HBase in the fashion of bulk load,
>  but
> > > > > > > > > after that I found many unexpected file were created in the
> > > > > > > > > HDFS
> > > > > directory
> > > > > > > > > of /user/root/, and they like these:
> > > > > > > > >
> > > > > > > > > /user/root/partitions_fd74866b-6588-468d-8463-474e202db070
> > > > > > > > > /user/root/partitions_fd867cd2-d9c9-48f5-9eec-185b2e57788d
> > > > > > > > > /user/root/partitions_fda37b8a-a882-4787-babc-8310a969f85c
> > > > > > > > > /user/root/partitions_fdaca2f4-2792-41f6-b7e8-61a8a5677dea
> > > > > > > > > /user/root/partitions_fdd55baa-3a12-493e-8844-a23ae83209c5
> > > > > > > > > /user/root/partitions_fdd85a3c-9abe-45d4-a0c6-76d2bed88ea5
> > > > > > > > > /user/root/partitions_fe133460-5f3f-4c6a-9fff-ff6c62410cc1
> > > > > > > > > /user/root/partitions_fe29a2b0-b281-465f-8d4a-6044822d960a
> > > > > > > > > /user/root/partitions_fe2fa6fa-9066-484c-bc91-ec412e48d008
> > > > > > > > > /user/root/partitions_fe31667b-2d5a-452e-baf7-a81982fe954a
> > > > > > > > > /user/root/partitions_fe3a5542-bc4d-4137-9d5e-1a0c59f72ac3
> > > > > > > > > /user/root/partitions_fe6a9407-c27b-4a67-bb50-e6b9fd172bc9
> > > > > > > > > /user/root/partitions_fe6f9294-f970-473c-8659-c08292c27ddd
> > > > > > > > > ... ...
> > > > > > > > > ... ...
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > It seems that they are HFiles, but I don't know why the
> were
> > > > > created
> > > > > > > > here?
> > > > > > > > >
> > > > > > > > > I bulk load data into HBase in the following way:
> > > > > > > > >
> > > > > > > > > Firstly,   I wrote a MapReduce program which only has map
> > > tasks.
> > > > > The
> > > > > > > map
> > > > > > > > > tasks read some text data and emit them in the form of
> >  RowKey
> > > > > > > > > and KeyValue.The following is my program:
> > > > > > > > >
> > > > > > > > >         @Override
> > > > > > > > >         protected void map(NullWritable NULL,
> > > > > > > > > GtpcV1SignalWritable signal, Context ctx) throws
> > > > InterruptedException, IOException {
> > > > > > > > >             String strRowkey = xxx;
> > > > > > > > >             byte[] rowkeyBytes = Bytes.toBytes(strRowkey);
> > > > > > > > >
> > > > > > > > >             rowkey.set(rowkeyBytes);
> > > > > > > > >
> > > > > > > > >             part1.init(signal);
> > > > > > > > >             part2.init(signal);
> > > > > > > > >
> > > > > > > > >             KeyValue kv = new KeyValue(rowkeyBytes,
> Family_A,
> > > > > > > > > Qualifier_Q, part1.serialize());
> > > > > > > > >             ctx.write(rowkey, kv);
> > > > > > > > >
> > > > > > > > >             kv = new KeyValue(rowkeyBytes, Family_B,
> > > > > > > > > Qualifier_Q, part2.serialize());
> > > > > > > > >             ctx.write(rowkey, kv);
> > > > > > > > >         }
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > after the MR programs finished, there were several HFiles
> > > > > > > > > generated in the output directory I specified.
> > > > > > > > >
> > > > > > > > > Then I bean to load these HFiles into HBase using the
> > > > > > > > > following
> > > > > > > command:
> > > > > > > > >        hbase
> > > > > org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
> > > > > > > > > HFiles-Dir  MyTable
> > > > > > > > >
> > > > > > > > > Finally , I could see that the data were indeed loaded into
> > > > > > > > > the table in HBase.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > But, I could also see that there were many unexpected files
> > > > > > > > > generated in the HDFS directory of  /user/root/,  just as I
> > > > > > > > > have mentioned at the begining of this mail,  and I did not
> > > > > > > > > specify any files to be produced in this directory.
> > > > > > > > >
> > > > > > > > > What happened ? Who can tell me what there files are and
> who
> > > > > > > > > produced
> > > > > > > > them?
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Why so many unexpected files like partitions_xxxx are created?

Posted by Ted Yu <yu...@gmail.com>.

    } catch (IOException e) {
      LOG.error("Encountered unrecoverable error from region server", e);
The IOException came from region server. During execution of
LoadIncrementalHFiles, there would be some error on region server.

Cheers


On Thu, Dec 19, 2013 at 10:18 PM, Tao Xiao <xi...@gmail.com> wrote:

> Hi Ted，
>      You let me check the log of LoadIncrementalHFiles to see what was the
> error from region server, but where is the log of LoadIncrementalHFiles? Is
> it in written into the log of region server? It seems the region server
> works well
>
>
>
>
> 2013/12/19 Ted Yu <yu...@gmail.com>
>
> > From the stack trace posted I saw:
> >
> > org.apache.commons.logging.impl.Log4JLogger.error(Log4JLogger.java:257)
> >     at
> >
> >
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.tryAtomicRegionLoad(
> > LoadIncrementalHFiles.java:577)
> >
> > Assuming 0.94 is used, line 577 at the tip of 0.94 is:
> >         LOG.warn("Attempt to bulk load region containing "
> >             + Bytes.toStringBinary(first) + " into table "
> >
> > But the following should be the corresponding line w.r.t. stack trace:
> >     } catch (IOException e) {
> >       LOG.error("Encountered unrecoverable error from region server", e);
> >
> > Tao:
> > Can you check the log of LoadIncrementalHFiles to see what was the error
> > from region server ?
> >
> > As Jieshan said, checking region server log would reveal something.
> >
> > Cheers
> >
> >
> > On Tue, Dec 17, 2013 at 10:40 PM, Bijieshan <bi...@huawei.com>
> wrote:
> >
> > > It seems LoadIncrementalHFiles is still running.  Can you run "jstack"
> on
> > > 1 RegionServer process also?
> > >
> > > Which version are you using?
> > >
> > > Jieshan.
> > > -----Original Message-----
> > > From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com]
> > > Sent: Wednesday, December 18, 2013 1:49 PM
> > > To: user@hbase.apache.org
> > > Subject: Re: Why so many unexpected files like partitions_xxxx are
> > created?
> > >
> > > I did jstack one such process and can see the following output in the
> > > terminal, and I guess this info told us that the processes started by
> the
> > > command "LoadIncrementalHFiles" never exit. Why didn't they exit after
> > > finished running ?
> > >
> > > ... ...
> > > ... ...
> > >
> > > "LoadIncrementalHFiles-0.LruBlockCache.EvictionThread" daemon prio=10
> > > tid=0x000000004129c000 nid=0x2186 in Object.wait() [0x00007f53f3665000]
> > >    java.lang.Thread.State: WAITING (on object monitor)
> > >     at java.lang.Object.wait(Native Method)
> > >     - waiting on <0x000000075fcf3370> (a
> > > org.apache.hadoop.hbase.io.hfile.LruBlockCache$EvictionThread)
> > >     at java.lang.Object.wait(Object.java:485)
> > >     at
> > >
> > >
> >
> org.apache.hadoop.hbase.io.hfile.LruBlockCache$EvictionThread.run(LruBlockCache.java:631)
> > >     - locked <0x000000075fcf3370> (a
> > > org.apache.hadoop.hbase.io.hfile.LruBlockCache$EvictionThread)
> > >     at java.lang.Thread.run(Thread.java:662)
> > >
> > >    Locked ownable synchronizers:
> > >     - None
> > >
> > > "LoadIncrementalHFiles-3" prio=10 tid=0x00007f540ca55800 nid=0x2185
> > > runnable [0x00007f53f3765000]
> > >    java.lang.Thread.State: RUNNABLE
> > >     at java.io.FileOutputStream.writeBytes(Native Method)
> > >     at java.io.FileOutputStream.write(FileOutputStream.java:282)
> > >     at
> java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
> > >     - locked <0x0000000763e5af70> (a java.io.BufferedOutputStream)
> > >     at java.io.PrintStream.write(PrintStream.java:430)
> > >     - locked <0x0000000763d5b670> (a java.io.PrintStream)
> > >     at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)
> > >     at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:263)
> > >     at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:106)
> > >     - locked <0x0000000763d6c6d0> (a java.io.OutputStreamWriter)
> > >     at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:116)
> > >     at java.io.OutputStreamWriter.write(OutputStreamWriter.java:203)
> > >     at java.io.Writer.write(Writer.java:140)
> > >     at org.apache.log4j.helpers.QuietWriter.write(QuietWriter.java:48)
> > >     at
> org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:317)
> > >     at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
> > >     at
> > > org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
> > >     - locked <0x0000000763d5fb90> (a org.apache.log4j.ConsoleAppender)
> > >     at
> > >
> > >
> >
> org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
> > >     at org.apache.log4j.Category.callAppenders(Category.java:206)
> > >     - locked <0x0000000763d65fe8> (a org.apache.log4j.spi.RootLogger)
> > >     at org.apache.log4j.Category.forcedLog(Category.java:391)
> > >     at org.apache.log4j.Category.log(Category.java:856)
> > >     at
> > > org.apache.commons.logging.impl.Log4JLogger.error(Log4JLogger.java:257)
> > >     at
> > >
> > >
> >
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.tryAtomicRegionLoad(LoadIncrementalHFiles.java:577)
> > >     at
> > >
> > >
> >
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$1.call(LoadIncrementalHFiles.java:316)
> > >     at
> > >
> > >
> >
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$1.call(LoadIncrementalHFiles.java:314)
> > >     at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > >     at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > >     at
> > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> > >     at
> > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> > >     at java.lang.Thread.run(Thread.java:662)
> > >
> > >    Locked ownable synchronizers:
> > >     - <0x000000075fe494c0> (a
> > > java.util.concurrent.locks.ReentrantLock$NonfairSync)
> > >
> > > ... ...
> > > ... ...
> > >
> > > "Reference Handler" daemon prio=10 tid=0x00007f540c138800 nid=0x2172 in
> > > Object.wait() [0x00007f5401355000]
> > >    java.lang.Thread.State: WAITING (on object monitor)
> > >     at java.lang.Object.wait(Native Method)
> > >     - waiting on <0x0000000763d51078> (a java.lang.ref.Reference$Lock)
> > >     at java.lang.Object.wait(Object.java:485)
> > >     at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
> > >     - locked <0x0000000763d51078> (a java.lang.ref.Reference$Lock)
> > >
> > >    Locked ownable synchronizers:
> > >     - None
> > >
> > > "main" prio=10 tid=0x00007f540c00e000 nid=0x216a waiting on condition
> > > [0x00007f54114ac000]
> > >    java.lang.Thread.State: WAITING (parking)
> > >     at sun.misc.Unsafe.park(Native Method)
> > >     - parking to wait for  <0x000000075ea67310> (a
> > > java.util.concurrent.FutureTask$Sync)
> > >     at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
> > >     at
> > >
> > >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
> > >     at
> > >
> > >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
> > >     at
> > >
> > >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
> > >     at
> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
> > >     at java.util.concurrent.FutureTask.get(FutureTask.java:83)
> > >     at
> > >
> > >
> >
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.bulkLoadPhase(LoadIncrementalHFiles.java:326)
> > >     at
> > >
> > >
> >
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:261)
> > >     at
> > >
> > >
> >
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:780)
> > >     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> > >     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> > >     at
> > >
> > >
> >
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:785)
> > >
> > >    Locked ownable synchronizers:
> > >     - None
> > >
> > > "VM Thread" prio=10 tid=0x00007f540c132000 nid=0x2170 runnable
> > >
> > > "Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x00007f540c01c800
> > > nid=0x216b runnable
> > >
> > > "Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x00007f540c01e800
> > > nid=0x216c runnable
> > >
> > > "Gang worker#2 (Parallel GC Threads)" prio=10 tid=0x00007f540c020000
> > > nid=0x216d runnable
> > >
> > > "Gang worker#3 (Parallel GC Threads)" prio=10 tid=0x00007f540c022000
> > > nid=0x216e runnable
> > >
> > > "Concurrent Mark-Sweep GC Thread" prio=10 tid=0x00007f540c0b1000
> > > nid=0x216f runnable "VM Periodic Task Thread" prio=10
> > > tid=0x00007f540c16b000 nid=0x217a waiting on condition
> > >
> > > JNI global references: 1118
> > >
> > >
> > > 2013/12/18 Ted Yu <yu...@gmail.com>
> > >
> > > > Tao:
> > > > Can you jstack one such process next time you see them hanging ?
> > > >
> > > > Thanks
> > > >
> > > >
> > > > On Tue, Dec 17, 2013 at 6:31 PM, Tao Xiao <xi...@gmail.com>
> > > > wrote:
> > > >
> > > > > BTW, I noticed another problem. I bulk load data into HBase every
> > > > > five minutes, but I found that whenever the following command was
> > > executed
> > > > >     hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
> > > > > HFiles-Dir  MyTable
> > > > >
> > > > > there is a new process called "LoadIncrementalHFiles"
> > > > >
> > > > > I can see many processes called "LoadIncrementalHFiles" using the
> > > > > command "jps" in the terminal， why are these processes still there
> > > > > even after the command that bulk load HFiles into HBase has
> finished
> > > > > executing ? I have
> > > > to
> > > > > kill them myself.
> > > > >
> > > > >
> > > > > 2013/12/17 Bijieshan <bi...@huawei.com>
> > > > >
> > > > > > Yes, it should be cleaned up. But not included in current code in
> > > > > > my understanding.
> > > > > >
> > > > > > Jieshan.
> > > > > > -----Original Message-----
> > > > > > From: Ted Yu [mailto:yuzhihong@gmail.com]
> > > > > > Sent: Tuesday, December 17, 2013 10:55 AM
> > > > > > To: user@hbase.apache.org
> > > > > > Subject: Re: Why so many unexpected files like partitions_xxxx
> are
> > > > > created?
> > > > > >
> > > > > > Should bulk load task clean up partitions_xxxx upon completion ?
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > >
> > > > > > On Mon, Dec 16, 2013 at 6:53 PM, Bijieshan <bijieshan@huawei.com
> >
> > > > wrote:
> > > > > >
> > > > > > > >  I think I should delete these files immediately after I have
> > > > > > > > finished
> > > > > > > bulk loading data into HBase since they are useless at that
> > > > > > > time,
> > > > > right ?
> > > > > > >
> > > > > > > Ya. I think so. They are useless once bulk load task finished.
> > > > > > >
> > > > > > > Jieshan.
> > > > > > > -----Original Message-----
> > > > > > > From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com]
> > > > > > > Sent: Tuesday, December 17, 2013 9:34 AM
> > > > > > > To: user@hbase.apache.org
> > > > > > > Subject: Re: Why so many unexpected files like partitions_xxxx
> > > > > > > are
> > > > > > created?
> > > > > > >
> > > > > > > Indeed these files are produced by
> > > org.apache.hadoop.hbase.mapreduce.
> > > > > > > LoadIncrementalHFiles in the directory specified by what
> > > > > > > job.getWorkingDirectory()
> > > > > > > returns, and I think I should delete these files immediately
> > > > > > > after I have finished bulk loading data into HBase since they
> > > > > > > are useless at that time, right ?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > 2013/12/16 Bijieshan <bi...@huawei.com>
> > > > > > >
> > > > > > > > The reduce partition information is stored in this
> > > > > > > > partition_XXXX
> > > > > file.
> > > > > > > > See the below code:
> > > > > > > >
> > > > > > > > HFileOutputFormat#configureIncrementalLoad:
> > > > > > > >         .....................
> > > > > > > >     Path partitionsPath = new Path(job.getWorkingDirectory(),
> > > > > > > >                                    "partitions_" +
> > > > > UUID.randomUUID());
> > > > > > > >     LOG.info("Writing partition information to " +
> > > > > > > > partitionsPath);
> > > > > > > >
> > > > > > > >     FileSystem fs = partitionsPath.getFileSystem(conf);
> > > > > > > >     writePartitions(conf, partitionsPath, startKeys);
> > > > > > > >         .....................
> > > > > > > >
> > > > > > > > Hoping it helps.
> > > > > > > >
> > > > > > > > Jieshan
> > > > > > > > -----Original Message-----
> > > > > > > > From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com]
> > > > > > > > Sent: Monday, December 16, 2013 6:48 PM
> > > > > > > > To: user@hbase.apache.org
> > > > > > > > Subject: Why so many unexpected files like partitions_xxxx
> are
> > > > > created?
> > > > > > > >
> > > > > > > > I imported data into HBase in the fashion of bulk load,  but
> > > > > > > > after that I found many unexpected file were created in the
> > > > > > > > HDFS
> > > > directory
> > > > > > > > of /user/root/, and they like these:
> > > > > > > >
> > > > > > > > /user/root/partitions_fd74866b-6588-468d-8463-474e202db070
> > > > > > > > /user/root/partitions_fd867cd2-d9c9-48f5-9eec-185b2e57788d
> > > > > > > > /user/root/partitions_fda37b8a-a882-4787-babc-8310a969f85c
> > > > > > > > /user/root/partitions_fdaca2f4-2792-41f6-b7e8-61a8a5677dea
> > > > > > > > /user/root/partitions_fdd55baa-3a12-493e-8844-a23ae83209c5
> > > > > > > > /user/root/partitions_fdd85a3c-9abe-45d4-a0c6-76d2bed88ea5
> > > > > > > > /user/root/partitions_fe133460-5f3f-4c6a-9fff-ff6c62410cc1
> > > > > > > > /user/root/partitions_fe29a2b0-b281-465f-8d4a-6044822d960a
> > > > > > > > /user/root/partitions_fe2fa6fa-9066-484c-bc91-ec412e48d008
> > > > > > > > /user/root/partitions_fe31667b-2d5a-452e-baf7-a81982fe954a
> > > > > > > > /user/root/partitions_fe3a5542-bc4d-4137-9d5e-1a0c59f72ac3
> > > > > > > > /user/root/partitions_fe6a9407-c27b-4a67-bb50-e6b9fd172bc9
> > > > > > > > /user/root/partitions_fe6f9294-f970-473c-8659-c08292c27ddd
> > > > > > > > ... ...
> > > > > > > > ... ...
> > > > > > > >
> > > > > > > >
> > > > > > > > It seems that they are HFiles, but I don't know why the were
> > > > created
> > > > > > > here?
> > > > > > > >
> > > > > > > > I bulk load data into HBase in the following way:
> > > > > > > >
> > > > > > > > Firstly,   I wrote a MapReduce program which only has map
> > tasks.
> > > > The
> > > > > > map
> > > > > > > > tasks read some text data and emit them in the form of
>  RowKey
> > > > > > > > and KeyValue.The following is my program:
> > > > > > > >
> > > > > > > >         @Override
> > > > > > > >         protected void map(NullWritable NULL,
> > > > > > > > GtpcV1SignalWritable signal, Context ctx) throws
> > > InterruptedException, IOException {
> > > > > > > >             String strRowkey = xxx;
> > > > > > > >             byte[] rowkeyBytes = Bytes.toBytes(strRowkey);
> > > > > > > >
> > > > > > > >             rowkey.set(rowkeyBytes);
> > > > > > > >
> > > > > > > >             part1.init(signal);
> > > > > > > >             part2.init(signal);
> > > > > > > >
> > > > > > > >             KeyValue kv = new KeyValue(rowkeyBytes, Family_A,
> > > > > > > > Qualifier_Q, part1.serialize());
> > > > > > > >             ctx.write(rowkey, kv);
> > > > > > > >
> > > > > > > >             kv = new KeyValue(rowkeyBytes, Family_B,
> > > > > > > > Qualifier_Q, part2.serialize());
> > > > > > > >             ctx.write(rowkey, kv);
> > > > > > > >         }
> > > > > > > >
> > > > > > > >
> > > > > > > > after the MR programs finished, there were several HFiles
> > > > > > > > generated in the output directory I specified.
> > > > > > > >
> > > > > > > > Then I bean to load these HFiles into HBase using the
> > > > > > > > following
> > > > > > command:
> > > > > > > >        hbase
> > > > org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
> > > > > > > > HFiles-Dir  MyTable
> > > > > > > >
> > > > > > > > Finally , I could see that the data were indeed loaded into
> > > > > > > > the table in HBase.
> > > > > > > >
> > > > > > > >
> > > > > > > > But, I could also see that there were many unexpected files
> > > > > > > > generated in the HDFS directory of  /user/root/,  just as I
> > > > > > > > have mentioned at the begining of this mail,  and I did not
> > > > > > > > specify any files to be produced in this directory.
> > > > > > > >
> > > > > > > > What happened ? Who can tell me what there files are and who
> > > > > > > > produced
> > > > > > > them?
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Why so many unexpected files like partitions_xxxx are created?

Posted by Tao Xiao <xi...@gmail.com>.

Hi Ted，
     You let me check the log of LoadIncrementalHFiles to see what was the
error from region server, but where is the log of LoadIncrementalHFiles? Is
it in written into the log of region server? It seems the region server
works well




2013/12/19 Ted Yu <yu...@gmail.com>

> From the stack trace posted I saw:
>
> org.apache.commons.logging.impl.Log4JLogger.error(Log4JLogger.java:257)
>     at
>
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.tryAtomicRegionLoad(
> LoadIncrementalHFiles.java:577)
>
> Assuming 0.94 is used, line 577 at the tip of 0.94 is:
>         LOG.warn("Attempt to bulk load region containing "
>             + Bytes.toStringBinary(first) + " into table "
>
> But the following should be the corresponding line w.r.t. stack trace:
>     } catch (IOException e) {
>       LOG.error("Encountered unrecoverable error from region server", e);
>
> Tao:
> Can you check the log of LoadIncrementalHFiles to see what was the error
> from region server ?
>
> As Jieshan said, checking region server log would reveal something.
>
> Cheers
>
>
> On Tue, Dec 17, 2013 at 10:40 PM, Bijieshan <bi...@huawei.com> wrote:
>
> > It seems LoadIncrementalHFiles is still running.  Can you run "jstack" on
> > 1 RegionServer process also?
> >
> > Which version are you using?
> >
> > Jieshan.
> > -----Original Message-----
> > From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com]
> > Sent: Wednesday, December 18, 2013 1:49 PM
> > To: user@hbase.apache.org
> > Subject: Re: Why so many unexpected files like partitions_xxxx are
> created?
> >
> > I did jstack one such process and can see the following output in the
> > terminal, and I guess this info told us that the processes started by the
> > command "LoadIncrementalHFiles" never exit. Why didn't they exit after
> > finished running ?
> >
> > ... ...
> > ... ...
> >
> > "LoadIncrementalHFiles-0.LruBlockCache.EvictionThread" daemon prio=10
> > tid=0x000000004129c000 nid=0x2186 in Object.wait() [0x00007f53f3665000]
> >    java.lang.Thread.State: WAITING (on object monitor)
> >     at java.lang.Object.wait(Native Method)
> >     - waiting on <0x000000075fcf3370> (a
> > org.apache.hadoop.hbase.io.hfile.LruBlockCache$EvictionThread)
> >     at java.lang.Object.wait(Object.java:485)
> >     at
> >
> >
> org.apache.hadoop.hbase.io.hfile.LruBlockCache$EvictionThread.run(LruBlockCache.java:631)
> >     - locked <0x000000075fcf3370> (a
> > org.apache.hadoop.hbase.io.hfile.LruBlockCache$EvictionThread)
> >     at java.lang.Thread.run(Thread.java:662)
> >
> >    Locked ownable synchronizers:
> >     - None
> >
> > "LoadIncrementalHFiles-3" prio=10 tid=0x00007f540ca55800 nid=0x2185
> > runnable [0x00007f53f3765000]
> >    java.lang.Thread.State: RUNNABLE
> >     at java.io.FileOutputStream.writeBytes(Native Method)
> >     at java.io.FileOutputStream.write(FileOutputStream.java:282)
> >     at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
> >     - locked <0x0000000763e5af70> (a java.io.BufferedOutputStream)
> >     at java.io.PrintStream.write(PrintStream.java:430)
> >     - locked <0x0000000763d5b670> (a java.io.PrintStream)
> >     at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)
> >     at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:263)
> >     at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:106)
> >     - locked <0x0000000763d6c6d0> (a java.io.OutputStreamWriter)
> >     at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:116)
> >     at java.io.OutputStreamWriter.write(OutputStreamWriter.java:203)
> >     at java.io.Writer.write(Writer.java:140)
> >     at org.apache.log4j.helpers.QuietWriter.write(QuietWriter.java:48)
> >     at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:317)
> >     at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
> >     at
> > org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
> >     - locked <0x0000000763d5fb90> (a org.apache.log4j.ConsoleAppender)
> >     at
> >
> >
> org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
> >     at org.apache.log4j.Category.callAppenders(Category.java:206)
> >     - locked <0x0000000763d65fe8> (a org.apache.log4j.spi.RootLogger)
> >     at org.apache.log4j.Category.forcedLog(Category.java:391)
> >     at org.apache.log4j.Category.log(Category.java:856)
> >     at
> > org.apache.commons.logging.impl.Log4JLogger.error(Log4JLogger.java:257)
> >     at
> >
> >
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.tryAtomicRegionLoad(LoadIncrementalHFiles.java:577)
> >     at
> >
> >
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$1.call(LoadIncrementalHFiles.java:316)
> >     at
> >
> >
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$1.call(LoadIncrementalHFiles.java:314)
> >     at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> >     at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> >     at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> >     at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> >     at java.lang.Thread.run(Thread.java:662)
> >
> >    Locked ownable synchronizers:
> >     - <0x000000075fe494c0> (a
> > java.util.concurrent.locks.ReentrantLock$NonfairSync)
> >
> > ... ...
> > ... ...
> >
> > "Reference Handler" daemon prio=10 tid=0x00007f540c138800 nid=0x2172 in
> > Object.wait() [0x00007f5401355000]
> >    java.lang.Thread.State: WAITING (on object monitor)
> >     at java.lang.Object.wait(Native Method)
> >     - waiting on <0x0000000763d51078> (a java.lang.ref.Reference$Lock)
> >     at java.lang.Object.wait(Object.java:485)
> >     at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
> >     - locked <0x0000000763d51078> (a java.lang.ref.Reference$Lock)
> >
> >    Locked ownable synchronizers:
> >     - None
> >
> > "main" prio=10 tid=0x00007f540c00e000 nid=0x216a waiting on condition
> > [0x00007f54114ac000]
> >    java.lang.Thread.State: WAITING (parking)
> >     at sun.misc.Unsafe.park(Native Method)
> >     - parking to wait for  <0x000000075ea67310> (a
> > java.util.concurrent.FutureTask$Sync)
> >     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
> >     at
> >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
> >     at
> >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
> >     at
> >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
> >     at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
> >     at java.util.concurrent.FutureTask.get(FutureTask.java:83)
> >     at
> >
> >
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.bulkLoadPhase(LoadIncrementalHFiles.java:326)
> >     at
> >
> >
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:261)
> >     at
> >
> >
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:780)
> >     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> >     at
> >
> >
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:785)
> >
> >    Locked ownable synchronizers:
> >     - None
> >
> > "VM Thread" prio=10 tid=0x00007f540c132000 nid=0x2170 runnable
> >
> > "Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x00007f540c01c800
> > nid=0x216b runnable
> >
> > "Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x00007f540c01e800
> > nid=0x216c runnable
> >
> > "Gang worker#2 (Parallel GC Threads)" prio=10 tid=0x00007f540c020000
> > nid=0x216d runnable
> >
> > "Gang worker#3 (Parallel GC Threads)" prio=10 tid=0x00007f540c022000
> > nid=0x216e runnable
> >
> > "Concurrent Mark-Sweep GC Thread" prio=10 tid=0x00007f540c0b1000
> > nid=0x216f runnable "VM Periodic Task Thread" prio=10
> > tid=0x00007f540c16b000 nid=0x217a waiting on condition
> >
> > JNI global references: 1118
> >
> >
> > 2013/12/18 Ted Yu <yu...@gmail.com>
> >
> > > Tao:
> > > Can you jstack one such process next time you see them hanging ?
> > >
> > > Thanks
> > >
> > >
> > > On Tue, Dec 17, 2013 at 6:31 PM, Tao Xiao <xi...@gmail.com>
> > > wrote:
> > >
> > > > BTW, I noticed another problem. I bulk load data into HBase every
> > > > five minutes, but I found that whenever the following command was
> > executed
> > > >     hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
> > > > HFiles-Dir  MyTable
> > > >
> > > > there is a new process called "LoadIncrementalHFiles"
> > > >
> > > > I can see many processes called "LoadIncrementalHFiles" using the
> > > > command "jps" in the terminal， why are these processes still there
> > > > even after the command that bulk load HFiles into HBase has finished
> > > > executing ? I have
> > > to
> > > > kill them myself.
> > > >
> > > >
> > > > 2013/12/17 Bijieshan <bi...@huawei.com>
> > > >
> > > > > Yes, it should be cleaned up. But not included in current code in
> > > > > my understanding.
> > > > >
> > > > > Jieshan.
> > > > > -----Original Message-----
> > > > > From: Ted Yu [mailto:yuzhihong@gmail.com]
> > > > > Sent: Tuesday, December 17, 2013 10:55 AM
> > > > > To: user@hbase.apache.org
> > > > > Subject: Re: Why so many unexpected files like partitions_xxxx are
> > > > created?
> > > > >
> > > > > Should bulk load task clean up partitions_xxxx upon completion ?
> > > > >
> > > > > Cheers
> > > > >
> > > > >
> > > > > On Mon, Dec 16, 2013 at 6:53 PM, Bijieshan <bi...@huawei.com>
> > > wrote:
> > > > >
> > > > > > >  I think I should delete these files immediately after I have
> > > > > > > finished
> > > > > > bulk loading data into HBase since they are useless at that
> > > > > > time,
> > > > right ?
> > > > > >
> > > > > > Ya. I think so. They are useless once bulk load task finished.
> > > > > >
> > > > > > Jieshan.
> > > > > > -----Original Message-----
> > > > > > From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com]
> > > > > > Sent: Tuesday, December 17, 2013 9:34 AM
> > > > > > To: user@hbase.apache.org
> > > > > > Subject: Re: Why so many unexpected files like partitions_xxxx
> > > > > > are
> > > > > created?
> > > > > >
> > > > > > Indeed these files are produced by
> > org.apache.hadoop.hbase.mapreduce.
> > > > > > LoadIncrementalHFiles in the directory specified by what
> > > > > > job.getWorkingDirectory()
> > > > > > returns, and I think I should delete these files immediately
> > > > > > after I have finished bulk loading data into HBase since they
> > > > > > are useless at that time, right ?
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > 2013/12/16 Bijieshan <bi...@huawei.com>
> > > > > >
> > > > > > > The reduce partition information is stored in this
> > > > > > > partition_XXXX
> > > > file.
> > > > > > > See the below code:
> > > > > > >
> > > > > > > HFileOutputFormat#configureIncrementalLoad:
> > > > > > >         .....................
> > > > > > >     Path partitionsPath = new Path(job.getWorkingDirectory(),
> > > > > > >                                    "partitions_" +
> > > > UUID.randomUUID());
> > > > > > >     LOG.info("Writing partition information to " +
> > > > > > > partitionsPath);
> > > > > > >
> > > > > > >     FileSystem fs = partitionsPath.getFileSystem(conf);
> > > > > > >     writePartitions(conf, partitionsPath, startKeys);
> > > > > > >         .....................
> > > > > > >
> > > > > > > Hoping it helps.
> > > > > > >
> > > > > > > Jieshan
> > > > > > > -----Original Message-----
> > > > > > > From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com]
> > > > > > > Sent: Monday, December 16, 2013 6:48 PM
> > > > > > > To: user@hbase.apache.org
> > > > > > > Subject: Why so many unexpected files like partitions_xxxx are
> > > > created?
> > > > > > >
> > > > > > > I imported data into HBase in the fashion of bulk load,  but
> > > > > > > after that I found many unexpected file were created in the
> > > > > > > HDFS
> > > directory
> > > > > > > of /user/root/, and they like these:
> > > > > > >
> > > > > > > /user/root/partitions_fd74866b-6588-468d-8463-474e202db070
> > > > > > > /user/root/partitions_fd867cd2-d9c9-48f5-9eec-185b2e57788d
> > > > > > > /user/root/partitions_fda37b8a-a882-4787-babc-8310a969f85c
> > > > > > > /user/root/partitions_fdaca2f4-2792-41f6-b7e8-61a8a5677dea
> > > > > > > /user/root/partitions_fdd55baa-3a12-493e-8844-a23ae83209c5
> > > > > > > /user/root/partitions_fdd85a3c-9abe-45d4-a0c6-76d2bed88ea5
> > > > > > > /user/root/partitions_fe133460-5f3f-4c6a-9fff-ff6c62410cc1
> > > > > > > /user/root/partitions_fe29a2b0-b281-465f-8d4a-6044822d960a
> > > > > > > /user/root/partitions_fe2fa6fa-9066-484c-bc91-ec412e48d008
> > > > > > > /user/root/partitions_fe31667b-2d5a-452e-baf7-a81982fe954a
> > > > > > > /user/root/partitions_fe3a5542-bc4d-4137-9d5e-1a0c59f72ac3
> > > > > > > /user/root/partitions_fe6a9407-c27b-4a67-bb50-e6b9fd172bc9
> > > > > > > /user/root/partitions_fe6f9294-f970-473c-8659-c08292c27ddd
> > > > > > > ... ...
> > > > > > > ... ...
> > > > > > >
> > > > > > >
> > > > > > > It seems that they are HFiles, but I don't know why the were
> > > created
> > > > > > here?
> > > > > > >
> > > > > > > I bulk load data into HBase in the following way:
> > > > > > >
> > > > > > > Firstly,   I wrote a MapReduce program which only has map
> tasks.
> > > The
> > > > > map
> > > > > > > tasks read some text data and emit them in the form of  RowKey
> > > > > > > and KeyValue.The following is my program:
> > > > > > >
> > > > > > >         @Override
> > > > > > >         protected void map(NullWritable NULL,
> > > > > > > GtpcV1SignalWritable signal, Context ctx) throws
> > InterruptedException, IOException {
> > > > > > >             String strRowkey = xxx;
> > > > > > >             byte[] rowkeyBytes = Bytes.toBytes(strRowkey);
> > > > > > >
> > > > > > >             rowkey.set(rowkeyBytes);
> > > > > > >
> > > > > > >             part1.init(signal);
> > > > > > >             part2.init(signal);
> > > > > > >
> > > > > > >             KeyValue kv = new KeyValue(rowkeyBytes, Family_A,
> > > > > > > Qualifier_Q, part1.serialize());
> > > > > > >             ctx.write(rowkey, kv);
> > > > > > >
> > > > > > >             kv = new KeyValue(rowkeyBytes, Family_B,
> > > > > > > Qualifier_Q, part2.serialize());
> > > > > > >             ctx.write(rowkey, kv);
> > > > > > >         }
> > > > > > >
> > > > > > >
> > > > > > > after the MR programs finished, there were several HFiles
> > > > > > > generated in the output directory I specified.
> > > > > > >
> > > > > > > Then I bean to load these HFiles into HBase using the
> > > > > > > following
> > > > > command:
> > > > > > >        hbase
> > > org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
> > > > > > > HFiles-Dir  MyTable
> > > > > > >
> > > > > > > Finally , I could see that the data were indeed loaded into
> > > > > > > the table in HBase.
> > > > > > >
> > > > > > >
> > > > > > > But, I could also see that there were many unexpected files
> > > > > > > generated in the HDFS directory of  /user/root/,  just as I
> > > > > > > have mentioned at the begining of this mail,  and I did not
> > > > > > > specify any files to be produced in this directory.
> > > > > > >
> > > > > > > What happened ? Who can tell me what there files are and who
> > > > > > > produced
> > > > > > them?
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Why so many unexpected files like partitions_xxxx are created?

Posted by Ted Yu <yu...@gmail.com>.

>From the stack trace posted I saw:

org.apache.commons.logging.impl.Log4JLogger.error(Log4JLogger.java:257)
    at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.tryAtomicRegionLoad(
LoadIncrementalHFiles.java:577)

Assuming 0.94 is used, line 577 at the tip of 0.94 is:
        LOG.warn("Attempt to bulk load region containing "
            + Bytes.toStringBinary(first) + " into table "

But the following should be the corresponding line w.r.t. stack trace:
    } catch (IOException e) {
      LOG.error("Encountered unrecoverable error from region server", e);

Tao:
Can you check the log of LoadIncrementalHFiles to see what was the error
from region server ?

As Jieshan said, checking region server log would reveal something.

Cheers


On Tue, Dec 17, 2013 at 10:40 PM, Bijieshan <bi...@huawei.com> wrote:

> It seems LoadIncrementalHFiles is still running.  Can you run "jstack" on
> 1 RegionServer process also?
>
> Which version are you using?
>
> Jieshan.
> -----Original Message-----
> From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com]
> Sent: Wednesday, December 18, 2013 1:49 PM
> To: user@hbase.apache.org
> Subject: Re: Why so many unexpected files like partitions_xxxx are created?
>
> I did jstack one such process and can see the following output in the
> terminal, and I guess this info told us that the processes started by the
> command "LoadIncrementalHFiles" never exit. Why didn't they exit after
> finished running ?
>
> ... ...
> ... ...
>
> "LoadIncrementalHFiles-0.LruBlockCache.EvictionThread" daemon prio=10
> tid=0x000000004129c000 nid=0x2186 in Object.wait() [0x00007f53f3665000]
>    java.lang.Thread.State: WAITING (on object monitor)
>     at java.lang.Object.wait(Native Method)
>     - waiting on <0x000000075fcf3370> (a
> org.apache.hadoop.hbase.io.hfile.LruBlockCache$EvictionThread)
>     at java.lang.Object.wait(Object.java:485)
>     at
>
> org.apache.hadoop.hbase.io.hfile.LruBlockCache$EvictionThread.run(LruBlockCache.java:631)
>     - locked <0x000000075fcf3370> (a
> org.apache.hadoop.hbase.io.hfile.LruBlockCache$EvictionThread)
>     at java.lang.Thread.run(Thread.java:662)
>
>    Locked ownable synchronizers:
>     - None
>
> "LoadIncrementalHFiles-3" prio=10 tid=0x00007f540ca55800 nid=0x2185
> runnable [0x00007f53f3765000]
>    java.lang.Thread.State: RUNNABLE
>     at java.io.FileOutputStream.writeBytes(Native Method)
>     at java.io.FileOutputStream.write(FileOutputStream.java:282)
>     at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
>     - locked <0x0000000763e5af70> (a java.io.BufferedOutputStream)
>     at java.io.PrintStream.write(PrintStream.java:430)
>     - locked <0x0000000763d5b670> (a java.io.PrintStream)
>     at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)
>     at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:263)
>     at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:106)
>     - locked <0x0000000763d6c6d0> (a java.io.OutputStreamWriter)
>     at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:116)
>     at java.io.OutputStreamWriter.write(OutputStreamWriter.java:203)
>     at java.io.Writer.write(Writer.java:140)
>     at org.apache.log4j.helpers.QuietWriter.write(QuietWriter.java:48)
>     at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:317)
>     at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
>     at
> org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
>     - locked <0x0000000763d5fb90> (a org.apache.log4j.ConsoleAppender)
>     at
>
> org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
>     at org.apache.log4j.Category.callAppenders(Category.java:206)
>     - locked <0x0000000763d65fe8> (a org.apache.log4j.spi.RootLogger)
>     at org.apache.log4j.Category.forcedLog(Category.java:391)
>     at org.apache.log4j.Category.log(Category.java:856)
>     at
> org.apache.commons.logging.impl.Log4JLogger.error(Log4JLogger.java:257)
>     at
>
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.tryAtomicRegionLoad(LoadIncrementalHFiles.java:577)
>     at
>
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$1.call(LoadIncrementalHFiles.java:316)
>     at
>
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$1.call(LoadIncrementalHFiles.java:314)
>     at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>     at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>     at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>     at java.lang.Thread.run(Thread.java:662)
>
>    Locked ownable synchronizers:
>     - <0x000000075fe494c0> (a
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>
> ... ...
> ... ...
>
> "Reference Handler" daemon prio=10 tid=0x00007f540c138800 nid=0x2172 in
> Object.wait() [0x00007f5401355000]
>    java.lang.Thread.State: WAITING (on object monitor)
>     at java.lang.Object.wait(Native Method)
>     - waiting on <0x0000000763d51078> (a java.lang.ref.Reference$Lock)
>     at java.lang.Object.wait(Object.java:485)
>     at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
>     - locked <0x0000000763d51078> (a java.lang.ref.Reference$Lock)
>
>    Locked ownable synchronizers:
>     - None
>
> "main" prio=10 tid=0x00007f540c00e000 nid=0x216a waiting on condition
> [0x00007f54114ac000]
>    java.lang.Thread.State: WAITING (parking)
>     at sun.misc.Unsafe.park(Native Method)
>     - parking to wait for  <0x000000075ea67310> (a
> java.util.concurrent.FutureTask$Sync)
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
>     at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
>     at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
>     at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
>     at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
>     at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>     at
>
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.bulkLoadPhase(LoadIncrementalHFiles.java:326)
>     at
>
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:261)
>     at
>
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:780)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>     at
>
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:785)
>
>    Locked ownable synchronizers:
>     - None
>
> "VM Thread" prio=10 tid=0x00007f540c132000 nid=0x2170 runnable
>
> "Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x00007f540c01c800
> nid=0x216b runnable
>
> "Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x00007f540c01e800
> nid=0x216c runnable
>
> "Gang worker#2 (Parallel GC Threads)" prio=10 tid=0x00007f540c020000
> nid=0x216d runnable
>
> "Gang worker#3 (Parallel GC Threads)" prio=10 tid=0x00007f540c022000
> nid=0x216e runnable
>
> "Concurrent Mark-Sweep GC Thread" prio=10 tid=0x00007f540c0b1000
> nid=0x216f runnable "VM Periodic Task Thread" prio=10
> tid=0x00007f540c16b000 nid=0x217a waiting on condition
>
> JNI global references: 1118
>
>
> 2013/12/18 Ted Yu <yu...@gmail.com>
>
> > Tao:
> > Can you jstack one such process next time you see them hanging ?
> >
> > Thanks
> >
> >
> > On Tue, Dec 17, 2013 at 6:31 PM, Tao Xiao <xi...@gmail.com>
> > wrote:
> >
> > > BTW, I noticed another problem. I bulk load data into HBase every
> > > five minutes, but I found that whenever the following command was
> executed
> > >     hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
> > > HFiles-Dir  MyTable
> > >
> > > there is a new process called "LoadIncrementalHFiles"
> > >
> > > I can see many processes called "LoadIncrementalHFiles" using the
> > > command "jps" in the terminal， why are these processes still there
> > > even after the command that bulk load HFiles into HBase has finished
> > > executing ? I have
> > to
> > > kill them myself.
> > >
> > >
> > > 2013/12/17 Bijieshan <bi...@huawei.com>
> > >
> > > > Yes, it should be cleaned up. But not included in current code in
> > > > my understanding.
> > > >
> > > > Jieshan.
> > > > -----Original Message-----
> > > > From: Ted Yu [mailto:yuzhihong@gmail.com]
> > > > Sent: Tuesday, December 17, 2013 10:55 AM
> > > > To: user@hbase.apache.org
> > > > Subject: Re: Why so many unexpected files like partitions_xxxx are
> > > created?
> > > >
> > > > Should bulk load task clean up partitions_xxxx upon completion ?
> > > >
> > > > Cheers
> > > >
> > > >
> > > > On Mon, Dec 16, 2013 at 6:53 PM, Bijieshan <bi...@huawei.com>
> > wrote:
> > > >
> > > > > >  I think I should delete these files immediately after I have
> > > > > > finished
> > > > > bulk loading data into HBase since they are useless at that
> > > > > time,
> > > right ?
> > > > >
> > > > > Ya. I think so. They are useless once bulk load task finished.
> > > > >
> > > > > Jieshan.
> > > > > -----Original Message-----
> > > > > From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com]
> > > > > Sent: Tuesday, December 17, 2013 9:34 AM
> > > > > To: user@hbase.apache.org
> > > > > Subject: Re: Why so many unexpected files like partitions_xxxx
> > > > > are
> > > > created?
> > > > >
> > > > > Indeed these files are produced by
> org.apache.hadoop.hbase.mapreduce.
> > > > > LoadIncrementalHFiles in the directory specified by what
> > > > > job.getWorkingDirectory()
> > > > > returns, and I think I should delete these files immediately
> > > > > after I have finished bulk loading data into HBase since they
> > > > > are useless at that time, right ?
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > 2013/12/16 Bijieshan <bi...@huawei.com>
> > > > >
> > > > > > The reduce partition information is stored in this
> > > > > > partition_XXXX
> > > file.
> > > > > > See the below code:
> > > > > >
> > > > > > HFileOutputFormat#configureIncrementalLoad:
> > > > > >         .....................
> > > > > >     Path partitionsPath = new Path(job.getWorkingDirectory(),
> > > > > >                                    "partitions_" +
> > > UUID.randomUUID());
> > > > > >     LOG.info("Writing partition information to " +
> > > > > > partitionsPath);
> > > > > >
> > > > > >     FileSystem fs = partitionsPath.getFileSystem(conf);
> > > > > >     writePartitions(conf, partitionsPath, startKeys);
> > > > > >         .....................
> > > > > >
> > > > > > Hoping it helps.
> > > > > >
> > > > > > Jieshan
> > > > > > -----Original Message-----
> > > > > > From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com]
> > > > > > Sent: Monday, December 16, 2013 6:48 PM
> > > > > > To: user@hbase.apache.org
> > > > > > Subject: Why so many unexpected files like partitions_xxxx are
> > > created?
> > > > > >
> > > > > > I imported data into HBase in the fashion of bulk load,  but
> > > > > > after that I found many unexpected file were created in the
> > > > > > HDFS
> > directory
> > > > > > of /user/root/, and they like these:
> > > > > >
> > > > > > /user/root/partitions_fd74866b-6588-468d-8463-474e202db070
> > > > > > /user/root/partitions_fd867cd2-d9c9-48f5-9eec-185b2e57788d
> > > > > > /user/root/partitions_fda37b8a-a882-4787-babc-8310a969f85c
> > > > > > /user/root/partitions_fdaca2f4-2792-41f6-b7e8-61a8a5677dea
> > > > > > /user/root/partitions_fdd55baa-3a12-493e-8844-a23ae83209c5
> > > > > > /user/root/partitions_fdd85a3c-9abe-45d4-a0c6-76d2bed88ea5
> > > > > > /user/root/partitions_fe133460-5f3f-4c6a-9fff-ff6c62410cc1
> > > > > > /user/root/partitions_fe29a2b0-b281-465f-8d4a-6044822d960a
> > > > > > /user/root/partitions_fe2fa6fa-9066-484c-bc91-ec412e48d008
> > > > > > /user/root/partitions_fe31667b-2d5a-452e-baf7-a81982fe954a
> > > > > > /user/root/partitions_fe3a5542-bc4d-4137-9d5e-1a0c59f72ac3
> > > > > > /user/root/partitions_fe6a9407-c27b-4a67-bb50-e6b9fd172bc9
> > > > > > /user/root/partitions_fe6f9294-f970-473c-8659-c08292c27ddd
> > > > > > ... ...
> > > > > > ... ...
> > > > > >
> > > > > >
> > > > > > It seems that they are HFiles, but I don't know why the were
> > created
> > > > > here?
> > > > > >
> > > > > > I bulk load data into HBase in the following way:
> > > > > >
> > > > > > Firstly,   I wrote a MapReduce program which only has map tasks.
> > The
> > > > map
> > > > > > tasks read some text data and emit them in the form of  RowKey
> > > > > > and KeyValue.The following is my program:
> > > > > >
> > > > > >         @Override
> > > > > >         protected void map(NullWritable NULL,
> > > > > > GtpcV1SignalWritable signal, Context ctx) throws
> InterruptedException, IOException {
> > > > > >             String strRowkey = xxx;
> > > > > >             byte[] rowkeyBytes = Bytes.toBytes(strRowkey);
> > > > > >
> > > > > >             rowkey.set(rowkeyBytes);
> > > > > >
> > > > > >             part1.init(signal);
> > > > > >             part2.init(signal);
> > > > > >
> > > > > >             KeyValue kv = new KeyValue(rowkeyBytes, Family_A,
> > > > > > Qualifier_Q, part1.serialize());
> > > > > >             ctx.write(rowkey, kv);
> > > > > >
> > > > > >             kv = new KeyValue(rowkeyBytes, Family_B,
> > > > > > Qualifier_Q, part2.serialize());
> > > > > >             ctx.write(rowkey, kv);
> > > > > >         }
> > > > > >
> > > > > >
> > > > > > after the MR programs finished, there were several HFiles
> > > > > > generated in the output directory I specified.
> > > > > >
> > > > > > Then I bean to load these HFiles into HBase using the
> > > > > > following
> > > > command:
> > > > > >        hbase
> > org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
> > > > > > HFiles-Dir  MyTable
> > > > > >
> > > > > > Finally , I could see that the data were indeed loaded into
> > > > > > the table in HBase.
> > > > > >
> > > > > >
> > > > > > But, I could also see that there were many unexpected files
> > > > > > generated in the HDFS directory of  /user/root/,  just as I
> > > > > > have mentioned at the begining of this mail,  and I did not
> > > > > > specify any files to be produced in this directory.
> > > > > >
> > > > > > What happened ? Who can tell me what there files are and who
> > > > > > produced
> > > > > them?
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > >
> > > >
> > >
> >
>

RE: Why so many unexpected files like partitions_xxxx are created?

Posted by Bijieshan <bi...@huawei.com>.

It seems LoadIncrementalHFiles is still running.  Can you run "jstack" on 1 RegionServer process also?

Which version are you using?

Jieshan.
-----Original Message-----
From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com] 
Sent: Wednesday, December 18, 2013 1:49 PM
To: user@hbase.apache.org
Subject: Re: Why so many unexpected files like partitions_xxxx are created?

I did jstack one such process and can see the following output in the terminal, and I guess this info told us that the processes started by the command "LoadIncrementalHFiles" never exit. Why didn't they exit after finished running ?

... ...
... ...

"LoadIncrementalHFiles-0.LruBlockCache.EvictionThread" daemon prio=10
tid=0x000000004129c000 nid=0x2186 in Object.wait() [0x00007f53f3665000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x000000075fcf3370> (a
org.apache.hadoop.hbase.io.hfile.LruBlockCache$EvictionThread)
    at java.lang.Object.wait(Object.java:485)
    at
org.apache.hadoop.hbase.io.hfile.LruBlockCache$EvictionThread.run(LruBlockCache.java:631)
    - locked <0x000000075fcf3370> (a
org.apache.hadoop.hbase.io.hfile.LruBlockCache$EvictionThread)
    at java.lang.Thread.run(Thread.java:662)

   Locked ownable synchronizers:
    - None

"LoadIncrementalHFiles-3" prio=10 tid=0x00007f540ca55800 nid=0x2185 runnable [0x00007f53f3765000]
   java.lang.Thread.State: RUNNABLE
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:282)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
    - locked <0x0000000763e5af70> (a java.io.BufferedOutputStream)
    at java.io.PrintStream.write(PrintStream.java:430)
    - locked <0x0000000763d5b670> (a java.io.PrintStream)
    at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)
    at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:263)
    at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:106)
    - locked <0x0000000763d6c6d0> (a java.io.OutputStreamWriter)
    at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:116)
    at java.io.OutputStreamWriter.write(OutputStreamWriter.java:203)
    at java.io.Writer.write(Writer.java:140)
    at org.apache.log4j.helpers.QuietWriter.write(QuietWriter.java:48)
    at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:317)
    at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
    at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
    - locked <0x0000000763d5fb90> (a org.apache.log4j.ConsoleAppender)
    at
org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
    at org.apache.log4j.Category.callAppenders(Category.java:206)
    - locked <0x0000000763d65fe8> (a org.apache.log4j.spi.RootLogger)
    at org.apache.log4j.Category.forcedLog(Category.java:391)
    at org.apache.log4j.Category.log(Category.java:856)
    at
org.apache.commons.logging.impl.Log4JLogger.error(Log4JLogger.java:257)
    at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.tryAtomicRegionLoad(LoadIncrementalHFiles.java:577)
    at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$1.call(LoadIncrementalHFiles.java:316)
    at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$1.call(LoadIncrementalHFiles.java:314)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
    at java.lang.Thread.run(Thread.java:662)

   Locked ownable synchronizers:
    - <0x000000075fe494c0> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)

... ...
... ...

"Reference Handler" daemon prio=10 tid=0x00007f540c138800 nid=0x2172 in
Object.wait() [0x00007f5401355000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x0000000763d51078> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:485)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
    - locked <0x0000000763d51078> (a java.lang.ref.Reference$Lock)

   Locked ownable synchronizers:
    - None

"main" prio=10 tid=0x00007f540c00e000 nid=0x216a waiting on condition [0x00007f54114ac000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x000000075ea67310> (a
java.util.concurrent.FutureTask$Sync)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
    at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
    at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
    at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
    at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
    at java.util.concurrent.FutureTask.get(FutureTask.java:83)
    at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.bulkLoadPhase(LoadIncrementalHFiles.java:326)
    at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:261)
    at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:780)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:785)

   Locked ownable synchronizers:
    - None

"VM Thread" prio=10 tid=0x00007f540c132000 nid=0x2170 runnable

"Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x00007f540c01c800 nid=0x216b runnable

"Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x00007f540c01e800 nid=0x216c runnable

"Gang worker#2 (Parallel GC Threads)" prio=10 tid=0x00007f540c020000 nid=0x216d runnable

"Gang worker#3 (Parallel GC Threads)" prio=10 tid=0x00007f540c022000 nid=0x216e runnable

"Concurrent Mark-Sweep GC Thread" prio=10 tid=0x00007f540c0b1000 nid=0x216f runnable "VM Periodic Task Thread" prio=10 tid=0x00007f540c16b000 nid=0x217a waiting on condition

JNI global references: 1118


2013/12/18 Ted Yu <yu...@gmail.com>

> Tao:
> Can you jstack one such process next time you see them hanging ?
>
> Thanks
>
>
> On Tue, Dec 17, 2013 at 6:31 PM, Tao Xiao <xi...@gmail.com>
> wrote:
>
> > BTW, I noticed another problem. I bulk load data into HBase every 
> > five minutes, but I found that whenever the following command was executed
> >     hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
> > HFiles-Dir  MyTable
> >
> > there is a new process called "LoadIncrementalHFiles"
> >
> > I can see many processes called "LoadIncrementalHFiles" using the 
> > command "jps" in the terminal， why are these processes still there 
> > even after the command that bulk load HFiles into HBase has finished 
> > executing ? I have
> to
> > kill them myself.
> >
> >
> > 2013/12/17 Bijieshan <bi...@huawei.com>
> >
> > > Yes, it should be cleaned up. But not included in current code in 
> > > my understanding.
> > >
> > > Jieshan.
> > > -----Original Message-----
> > > From: Ted Yu [mailto:yuzhihong@gmail.com]
> > > Sent: Tuesday, December 17, 2013 10:55 AM
> > > To: user@hbase.apache.org
> > > Subject: Re: Why so many unexpected files like partitions_xxxx are
> > created?
> > >
> > > Should bulk load task clean up partitions_xxxx upon completion ?
> > >
> > > Cheers
> > >
> > >
> > > On Mon, Dec 16, 2013 at 6:53 PM, Bijieshan <bi...@huawei.com>
> wrote:
> > >
> > > > >  I think I should delete these files immediately after I have 
> > > > > finished
> > > > bulk loading data into HBase since they are useless at that 
> > > > time,
> > right ?
> > > >
> > > > Ya. I think so. They are useless once bulk load task finished.
> > > >
> > > > Jieshan.
> > > > -----Original Message-----
> > > > From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com]
> > > > Sent: Tuesday, December 17, 2013 9:34 AM
> > > > To: user@hbase.apache.org
> > > > Subject: Re: Why so many unexpected files like partitions_xxxx 
> > > > are
> > > created?
> > > >
> > > > Indeed these files are produced by org.apache.hadoop.hbase.mapreduce.
> > > > LoadIncrementalHFiles in the directory specified by what
> > > > job.getWorkingDirectory()
> > > > returns, and I think I should delete these files immediately 
> > > > after I have finished bulk loading data into HBase since they 
> > > > are useless at that time, right ?
> > > >
> > > >
> > > >
> > > >
> > > > 2013/12/16 Bijieshan <bi...@huawei.com>
> > > >
> > > > > The reduce partition information is stored in this 
> > > > > partition_XXXX
> > file.
> > > > > See the below code:
> > > > >
> > > > > HFileOutputFormat#configureIncrementalLoad:
> > > > >         .....................
> > > > >     Path partitionsPath = new Path(job.getWorkingDirectory(),
> > > > >                                    "partitions_" +
> > UUID.randomUUID());
> > > > >     LOG.info("Writing partition information to " + 
> > > > > partitionsPath);
> > > > >
> > > > >     FileSystem fs = partitionsPath.getFileSystem(conf);
> > > > >     writePartitions(conf, partitionsPath, startKeys);
> > > > >         .....................
> > > > >
> > > > > Hoping it helps.
> > > > >
> > > > > Jieshan
> > > > > -----Original Message-----
> > > > > From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com]
> > > > > Sent: Monday, December 16, 2013 6:48 PM
> > > > > To: user@hbase.apache.org
> > > > > Subject: Why so many unexpected files like partitions_xxxx are
> > created?
> > > > >
> > > > > I imported data into HBase in the fashion of bulk load,  but 
> > > > > after that I found many unexpected file were created in the 
> > > > > HDFS
> directory
> > > > > of /user/root/, and they like these:
> > > > >
> > > > > /user/root/partitions_fd74866b-6588-468d-8463-474e202db070
> > > > > /user/root/partitions_fd867cd2-d9c9-48f5-9eec-185b2e57788d
> > > > > /user/root/partitions_fda37b8a-a882-4787-babc-8310a969f85c
> > > > > /user/root/partitions_fdaca2f4-2792-41f6-b7e8-61a8a5677dea
> > > > > /user/root/partitions_fdd55baa-3a12-493e-8844-a23ae83209c5
> > > > > /user/root/partitions_fdd85a3c-9abe-45d4-a0c6-76d2bed88ea5
> > > > > /user/root/partitions_fe133460-5f3f-4c6a-9fff-ff6c62410cc1
> > > > > /user/root/partitions_fe29a2b0-b281-465f-8d4a-6044822d960a
> > > > > /user/root/partitions_fe2fa6fa-9066-484c-bc91-ec412e48d008
> > > > > /user/root/partitions_fe31667b-2d5a-452e-baf7-a81982fe954a
> > > > > /user/root/partitions_fe3a5542-bc4d-4137-9d5e-1a0c59f72ac3
> > > > > /user/root/partitions_fe6a9407-c27b-4a67-bb50-e6b9fd172bc9
> > > > > /user/root/partitions_fe6f9294-f970-473c-8659-c08292c27ddd
> > > > > ... ...
> > > > > ... ...
> > > > >
> > > > >
> > > > > It seems that they are HFiles, but I don't know why the were
> created
> > > > here?
> > > > >
> > > > > I bulk load data into HBase in the following way:
> > > > >
> > > > > Firstly,   I wrote a MapReduce program which only has map tasks.
> The
> > > map
> > > > > tasks read some text data and emit them in the form of  RowKey 
> > > > > and KeyValue.The following is my program:
> > > > >
> > > > >         @Override
> > > > >         protected void map(NullWritable NULL, 
> > > > > GtpcV1SignalWritable signal, Context ctx) throws InterruptedException, IOException {
> > > > >             String strRowkey = xxx;
> > > > >             byte[] rowkeyBytes = Bytes.toBytes(strRowkey);
> > > > >
> > > > >             rowkey.set(rowkeyBytes);
> > > > >
> > > > >             part1.init(signal);
> > > > >             part2.init(signal);
> > > > >
> > > > >             KeyValue kv = new KeyValue(rowkeyBytes, Family_A, 
> > > > > Qualifier_Q, part1.serialize());
> > > > >             ctx.write(rowkey, kv);
> > > > >
> > > > >             kv = new KeyValue(rowkeyBytes, Family_B, 
> > > > > Qualifier_Q, part2.serialize());
> > > > >             ctx.write(rowkey, kv);
> > > > >         }
> > > > >
> > > > >
> > > > > after the MR programs finished, there were several HFiles 
> > > > > generated in the output directory I specified.
> > > > >
> > > > > Then I bean to load these HFiles into HBase using the 
> > > > > following
> > > command:
> > > > >        hbase
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
> > > > > HFiles-Dir  MyTable
> > > > >
> > > > > Finally , I could see that the data were indeed loaded into 
> > > > > the table in HBase.
> > > > >
> > > > >
> > > > > But, I could also see that there were many unexpected files 
> > > > > generated in the HDFS directory of  /user/root/,  just as I 
> > > > > have mentioned at the begining of this mail,  and I did not 
> > > > > specify any files to be produced in this directory.
> > > > >
> > > > > What happened ? Who can tell me what there files are and who 
> > > > > produced
> > > > them?
> > > > >
> > > > > Thanks
> > > > >
> > > >
> > >
> >
>

Re: Why so many unexpected files like partitions_xxxx are created?

Posted by Tao Xiao <xi...@gmail.com>.

I did jstack one such process and can see the following output in the
terminal, and I guess this info told us that the processes started by the
command "LoadIncrementalHFiles" never exit. Why didn't they exit after
finished running ?

... ...
... ...

"LoadIncrementalHFiles-0.LruBlockCache.EvictionThread" daemon prio=10
tid=0x000000004129c000 nid=0x2186 in Object.wait() [0x00007f53f3665000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x000000075fcf3370> (a
org.apache.hadoop.hbase.io.hfile.LruBlockCache$EvictionThread)
    at java.lang.Object.wait(Object.java:485)
    at
org.apache.hadoop.hbase.io.hfile.LruBlockCache$EvictionThread.run(LruBlockCache.java:631)
    - locked <0x000000075fcf3370> (a
org.apache.hadoop.hbase.io.hfile.LruBlockCache$EvictionThread)
    at java.lang.Thread.run(Thread.java:662)

   Locked ownable synchronizers:
    - None

"LoadIncrementalHFiles-3" prio=10 tid=0x00007f540ca55800 nid=0x2185
runnable [0x00007f53f3765000]
   java.lang.Thread.State: RUNNABLE
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:282)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
    - locked <0x0000000763e5af70> (a java.io.BufferedOutputStream)
    at java.io.PrintStream.write(PrintStream.java:430)
    - locked <0x0000000763d5b670> (a java.io.PrintStream)
    at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)
    at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:263)
    at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:106)
    - locked <0x0000000763d6c6d0> (a java.io.OutputStreamWriter)
    at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:116)
    at java.io.OutputStreamWriter.write(OutputStreamWriter.java:203)
    at java.io.Writer.write(Writer.java:140)
    at org.apache.log4j.helpers.QuietWriter.write(QuietWriter.java:48)
    at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:317)
    at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
    at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
    - locked <0x0000000763d5fb90> (a org.apache.log4j.ConsoleAppender)
    at
org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
    at org.apache.log4j.Category.callAppenders(Category.java:206)
    - locked <0x0000000763d65fe8> (a org.apache.log4j.spi.RootLogger)
    at org.apache.log4j.Category.forcedLog(Category.java:391)
    at org.apache.log4j.Category.log(Category.java:856)
    at
org.apache.commons.logging.impl.Log4JLogger.error(Log4JLogger.java:257)
    at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.tryAtomicRegionLoad(LoadIncrementalHFiles.java:577)
    at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$1.call(LoadIncrementalHFiles.java:316)
    at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$1.call(LoadIncrementalHFiles.java:314)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
    at java.lang.Thread.run(Thread.java:662)

   Locked ownable synchronizers:
    - <0x000000075fe494c0> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)

... ...
... ...

"Reference Handler" daemon prio=10 tid=0x00007f540c138800 nid=0x2172 in
Object.wait() [0x00007f5401355000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x0000000763d51078> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:485)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
    - locked <0x0000000763d51078> (a java.lang.ref.Reference$Lock)

   Locked ownable synchronizers:
    - None

"main" prio=10 tid=0x00007f540c00e000 nid=0x216a waiting on condition
[0x00007f54114ac000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x000000075ea67310> (a
java.util.concurrent.FutureTask$Sync)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
    at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
    at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
    at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
    at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
    at java.util.concurrent.FutureTask.get(FutureTask.java:83)
    at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.bulkLoadPhase(LoadIncrementalHFiles.java:326)
    at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:261)
    at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:780)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:785)

   Locked ownable synchronizers:
    - None

"VM Thread" prio=10 tid=0x00007f540c132000 nid=0x2170 runnable

"Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x00007f540c01c800
nid=0x216b runnable

"Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x00007f540c01e800
nid=0x216c runnable

"Gang worker#2 (Parallel GC Threads)" prio=10 tid=0x00007f540c020000
nid=0x216d runnable

"Gang worker#3 (Parallel GC Threads)" prio=10 tid=0x00007f540c022000
nid=0x216e runnable

"Concurrent Mark-Sweep GC Thread" prio=10 tid=0x00007f540c0b1000 nid=0x216f
runnable
"VM Periodic Task Thread" prio=10 tid=0x00007f540c16b000 nid=0x217a waiting
on condition

JNI global references: 1118


2013/12/18 Ted Yu <yu...@gmail.com>

> Tao:
> Can you jstack one such process next time you see them hanging ?
>
> Thanks
>
>
> On Tue, Dec 17, 2013 at 6:31 PM, Tao Xiao <xi...@gmail.com>
> wrote:
>
> > BTW, I noticed another problem. I bulk load data into HBase every five
> > minutes, but I found that whenever the following command was executed
> >     hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
> > HFiles-Dir  MyTable
> >
> > there is a new process called "LoadIncrementalHFiles"
> >
> > I can see many processes called "LoadIncrementalHFiles" using the command
> > "jps" in the terminal， why are these processes still there even after the
> > command that bulk load HFiles into HBase has finished executing ? I have
> to
> > kill them myself.
> >
> >
> > 2013/12/17 Bijieshan <bi...@huawei.com>
> >
> > > Yes, it should be cleaned up. But not included in current code in my
> > > understanding.
> > >
> > > Jieshan.
> > > -----Original Message-----
> > > From: Ted Yu [mailto:yuzhihong@gmail.com]
> > > Sent: Tuesday, December 17, 2013 10:55 AM
> > > To: user@hbase.apache.org
> > > Subject: Re: Why so many unexpected files like partitions_xxxx are
> > created?
> > >
> > > Should bulk load task clean up partitions_xxxx upon completion ?
> > >
> > > Cheers
> > >
> > >
> > > On Mon, Dec 16, 2013 at 6:53 PM, Bijieshan <bi...@huawei.com>
> wrote:
> > >
> > > > >  I think I should delete these files immediately after I have
> > > > > finished
> > > > bulk loading data into HBase since they are useless at that time,
> > right ?
> > > >
> > > > Ya. I think so. They are useless once bulk load task finished.
> > > >
> > > > Jieshan.
> > > > -----Original Message-----
> > > > From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com]
> > > > Sent: Tuesday, December 17, 2013 9:34 AM
> > > > To: user@hbase.apache.org
> > > > Subject: Re: Why so many unexpected files like partitions_xxxx are
> > > created?
> > > >
> > > > Indeed these files are produced by org.apache.hadoop.hbase.mapreduce.
> > > > LoadIncrementalHFiles in the directory specified by what
> > > > job.getWorkingDirectory()
> > > > returns, and I think I should delete these files immediately after I
> > > > have finished bulk loading data into HBase since they are useless at
> > > > that time, right ?
> > > >
> > > >
> > > >
> > > >
> > > > 2013/12/16 Bijieshan <bi...@huawei.com>
> > > >
> > > > > The reduce partition information is stored in this partition_XXXX
> > file.
> > > > > See the below code:
> > > > >
> > > > > HFileOutputFormat#configureIncrementalLoad:
> > > > >         .....................
> > > > >     Path partitionsPath = new Path(job.getWorkingDirectory(),
> > > > >                                    "partitions_" +
> > UUID.randomUUID());
> > > > >     LOG.info("Writing partition information to " + partitionsPath);
> > > > >
> > > > >     FileSystem fs = partitionsPath.getFileSystem(conf);
> > > > >     writePartitions(conf, partitionsPath, startKeys);
> > > > >         .....................
> > > > >
> > > > > Hoping it helps.
> > > > >
> > > > > Jieshan
> > > > > -----Original Message-----
> > > > > From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com]
> > > > > Sent: Monday, December 16, 2013 6:48 PM
> > > > > To: user@hbase.apache.org
> > > > > Subject: Why so many unexpected files like partitions_xxxx are
> > created?
> > > > >
> > > > > I imported data into HBase in the fashion of bulk load,  but after
> > > > > that I found many unexpected file were created in the HDFS
> directory
> > > > > of /user/root/, and they like these:
> > > > >
> > > > > /user/root/partitions_fd74866b-6588-468d-8463-474e202db070
> > > > > /user/root/partitions_fd867cd2-d9c9-48f5-9eec-185b2e57788d
> > > > > /user/root/partitions_fda37b8a-a882-4787-babc-8310a969f85c
> > > > > /user/root/partitions_fdaca2f4-2792-41f6-b7e8-61a8a5677dea
> > > > > /user/root/partitions_fdd55baa-3a12-493e-8844-a23ae83209c5
> > > > > /user/root/partitions_fdd85a3c-9abe-45d4-a0c6-76d2bed88ea5
> > > > > /user/root/partitions_fe133460-5f3f-4c6a-9fff-ff6c62410cc1
> > > > > /user/root/partitions_fe29a2b0-b281-465f-8d4a-6044822d960a
> > > > > /user/root/partitions_fe2fa6fa-9066-484c-bc91-ec412e48d008
> > > > > /user/root/partitions_fe31667b-2d5a-452e-baf7-a81982fe954a
> > > > > /user/root/partitions_fe3a5542-bc4d-4137-9d5e-1a0c59f72ac3
> > > > > /user/root/partitions_fe6a9407-c27b-4a67-bb50-e6b9fd172bc9
> > > > > /user/root/partitions_fe6f9294-f970-473c-8659-c08292c27ddd
> > > > > ... ...
> > > > > ... ...
> > > > >
> > > > >
> > > > > It seems that they are HFiles, but I don't know why the were
> created
> > > > here?
> > > > >
> > > > > I bulk load data into HBase in the following way:
> > > > >
> > > > > Firstly,   I wrote a MapReduce program which only has map tasks.
> The
> > > map
> > > > > tasks read some text data and emit them in the form of  RowKey and
> > > > > KeyValue.The following is my program:
> > > > >
> > > > >         @Override
> > > > >         protected void map(NullWritable NULL, GtpcV1SignalWritable
> > > > > signal, Context ctx) throws InterruptedException, IOException {
> > > > >             String strRowkey = xxx;
> > > > >             byte[] rowkeyBytes = Bytes.toBytes(strRowkey);
> > > > >
> > > > >             rowkey.set(rowkeyBytes);
> > > > >
> > > > >             part1.init(signal);
> > > > >             part2.init(signal);
> > > > >
> > > > >             KeyValue kv = new KeyValue(rowkeyBytes, Family_A,
> > > > > Qualifier_Q, part1.serialize());
> > > > >             ctx.write(rowkey, kv);
> > > > >
> > > > >             kv = new KeyValue(rowkeyBytes, Family_B, Qualifier_Q,
> > > > > part2.serialize());
> > > > >             ctx.write(rowkey, kv);
> > > > >         }
> > > > >
> > > > >
> > > > > after the MR programs finished, there were several HFiles generated
> > > > > in the output directory I specified.
> > > > >
> > > > > Then I bean to load these HFiles into HBase using the following
> > > command:
> > > > >        hbase
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
> > > > > HFiles-Dir  MyTable
> > > > >
> > > > > Finally , I could see that the data were indeed loaded into the
> > > > > table in HBase.
> > > > >
> > > > >
> > > > > But, I could also see that there were many unexpected files
> > > > > generated in the HDFS directory of  /user/root/,  just as I have
> > > > > mentioned at the begining of this mail,  and I did not specify any
> > > > > files to be produced in this directory.
> > > > >
> > > > > What happened ? Who can tell me what there files are and who
> > > > > produced
> > > > them?
> > > > >
> > > > > Thanks
> > > > >
> > > >
> > >
> >
>

Re: Why so many unexpected files like partitions_xxxx are created?

Posted by Ted Yu <yu...@gmail.com>.

Tao:
Can you jstack one such process next time you see them hanging ?

Thanks


On Tue, Dec 17, 2013 at 6:31 PM, Tao Xiao <xi...@gmail.com> wrote:

> BTW, I noticed another problem. I bulk load data into HBase every five
> minutes, but I found that whenever the following command was executed
>     hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
> HFiles-Dir  MyTable
>
> there is a new process called "LoadIncrementalHFiles"
>
> I can see many processes called "LoadIncrementalHFiles" using the command
> "jps" in the terminal， why are these processes still there even after the
> command that bulk load HFiles into HBase has finished executing ? I have to
> kill them myself.
>
>
> 2013/12/17 Bijieshan <bi...@huawei.com>
>
> > Yes, it should be cleaned up. But not included in current code in my
> > understanding.
> >
> > Jieshan.
> > -----Original Message-----
> > From: Ted Yu [mailto:yuzhihong@gmail.com]
> > Sent: Tuesday, December 17, 2013 10:55 AM
> > To: user@hbase.apache.org
> > Subject: Re: Why so many unexpected files like partitions_xxxx are
> created?
> >
> > Should bulk load task clean up partitions_xxxx upon completion ?
> >
> > Cheers
> >
> >
> > On Mon, Dec 16, 2013 at 6:53 PM, Bijieshan <bi...@huawei.com> wrote:
> >
> > > >  I think I should delete these files immediately after I have
> > > > finished
> > > bulk loading data into HBase since they are useless at that time,
> right ?
> > >
> > > Ya. I think so. They are useless once bulk load task finished.
> > >
> > > Jieshan.
> > > -----Original Message-----
> > > From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com]
> > > Sent: Tuesday, December 17, 2013 9:34 AM
> > > To: user@hbase.apache.org
> > > Subject: Re: Why so many unexpected files like partitions_xxxx are
> > created?
> > >
> > > Indeed these files are produced by org.apache.hadoop.hbase.mapreduce.
> > > LoadIncrementalHFiles in the directory specified by what
> > > job.getWorkingDirectory()
> > > returns, and I think I should delete these files immediately after I
> > > have finished bulk loading data into HBase since they are useless at
> > > that time, right ?
> > >
> > >
> > >
> > >
> > > 2013/12/16 Bijieshan <bi...@huawei.com>
> > >
> > > > The reduce partition information is stored in this partition_XXXX
> file.
> > > > See the below code:
> > > >
> > > > HFileOutputFormat#configureIncrementalLoad:
> > > >         .....................
> > > >     Path partitionsPath = new Path(job.getWorkingDirectory(),
> > > >                                    "partitions_" +
> UUID.randomUUID());
> > > >     LOG.info("Writing partition information to " + partitionsPath);
> > > >
> > > >     FileSystem fs = partitionsPath.getFileSystem(conf);
> > > >     writePartitions(conf, partitionsPath, startKeys);
> > > >         .....................
> > > >
> > > > Hoping it helps.
> > > >
> > > > Jieshan
> > > > -----Original Message-----
> > > > From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com]
> > > > Sent: Monday, December 16, 2013 6:48 PM
> > > > To: user@hbase.apache.org
> > > > Subject: Why so many unexpected files like partitions_xxxx are
> created?
> > > >
> > > > I imported data into HBase in the fashion of bulk load,  but after
> > > > that I found many unexpected file were created in the HDFS directory
> > > > of /user/root/, and they like these:
> > > >
> > > > /user/root/partitions_fd74866b-6588-468d-8463-474e202db070
> > > > /user/root/partitions_fd867cd2-d9c9-48f5-9eec-185b2e57788d
> > > > /user/root/partitions_fda37b8a-a882-4787-babc-8310a969f85c
> > > > /user/root/partitions_fdaca2f4-2792-41f6-b7e8-61a8a5677dea
> > > > /user/root/partitions_fdd55baa-3a12-493e-8844-a23ae83209c5
> > > > /user/root/partitions_fdd85a3c-9abe-45d4-a0c6-76d2bed88ea5
> > > > /user/root/partitions_fe133460-5f3f-4c6a-9fff-ff6c62410cc1
> > > > /user/root/partitions_fe29a2b0-b281-465f-8d4a-6044822d960a
> > > > /user/root/partitions_fe2fa6fa-9066-484c-bc91-ec412e48d008
> > > > /user/root/partitions_fe31667b-2d5a-452e-baf7-a81982fe954a
> > > > /user/root/partitions_fe3a5542-bc4d-4137-9d5e-1a0c59f72ac3
> > > > /user/root/partitions_fe6a9407-c27b-4a67-bb50-e6b9fd172bc9
> > > > /user/root/partitions_fe6f9294-f970-473c-8659-c08292c27ddd
> > > > ... ...
> > > > ... ...
> > > >
> > > >
> > > > It seems that they are HFiles, but I don't know why the were created
> > > here?
> > > >
> > > > I bulk load data into HBase in the following way:
> > > >
> > > > Firstly,   I wrote a MapReduce program which only has map tasks. The
> > map
> > > > tasks read some text data and emit them in the form of  RowKey and
> > > > KeyValue.The following is my program:
> > > >
> > > >         @Override
> > > >         protected void map(NullWritable NULL, GtpcV1SignalWritable
> > > > signal, Context ctx) throws InterruptedException, IOException {
> > > >             String strRowkey = xxx;
> > > >             byte[] rowkeyBytes = Bytes.toBytes(strRowkey);
> > > >
> > > >             rowkey.set(rowkeyBytes);
> > > >
> > > >             part1.init(signal);
> > > >             part2.init(signal);
> > > >
> > > >             KeyValue kv = new KeyValue(rowkeyBytes, Family_A,
> > > > Qualifier_Q, part1.serialize());
> > > >             ctx.write(rowkey, kv);
> > > >
> > > >             kv = new KeyValue(rowkeyBytes, Family_B, Qualifier_Q,
> > > > part2.serialize());
> > > >             ctx.write(rowkey, kv);
> > > >         }
> > > >
> > > >
> > > > after the MR programs finished, there were several HFiles generated
> > > > in the output directory I specified.
> > > >
> > > > Then I bean to load these HFiles into HBase using the following
> > command:
> > > >        hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
> > > > HFiles-Dir  MyTable
> > > >
> > > > Finally , I could see that the data were indeed loaded into the
> > > > table in HBase.
> > > >
> > > >
> > > > But, I could also see that there were many unexpected files
> > > > generated in the HDFS directory of  /user/root/,  just as I have
> > > > mentioned at the begining of this mail,  and I did not specify any
> > > > files to be produced in this directory.
> > > >
> > > > What happened ? Who can tell me what there files are and who
> > > > produced
> > > them?
> > > >
> > > > Thanks
> > > >
> > >
> >
>

Re: Why so many unexpected files like partitions_xxxx are created?

Posted by Tao Xiao <xi...@gmail.com>.

BTW, I noticed another problem. I bulk load data into HBase every five
minutes, but I found that whenever the following command was executed
    hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
HFiles-Dir  MyTable

there is a new process called "LoadIncrementalHFiles"

I can see many processes called "LoadIncrementalHFiles" using the command
"jps" in the terminal， why are these processes still there even after the
command that bulk load HFiles into HBase has finished executing ? I have to
kill them myself.


2013/12/17 Bijieshan <bi...@huawei.com>

> Yes, it should be cleaned up. But not included in current code in my
> understanding.
>
> Jieshan.
> -----Original Message-----
> From: Ted Yu [mailto:yuzhihong@gmail.com]
> Sent: Tuesday, December 17, 2013 10:55 AM
> To: user@hbase.apache.org
> Subject: Re: Why so many unexpected files like partitions_xxxx are created?
>
> Should bulk load task clean up partitions_xxxx upon completion ?
>
> Cheers
>
>
> On Mon, Dec 16, 2013 at 6:53 PM, Bijieshan <bi...@huawei.com> wrote:
>
> > >  I think I should delete these files immediately after I have
> > > finished
> > bulk loading data into HBase since they are useless at that time, right ?
> >
> > Ya. I think so. They are useless once bulk load task finished.
> >
> > Jieshan.
> > -----Original Message-----
> > From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com]
> > Sent: Tuesday, December 17, 2013 9:34 AM
> > To: user@hbase.apache.org
> > Subject: Re: Why so many unexpected files like partitions_xxxx are
> created?
> >
> > Indeed these files are produced by org.apache.hadoop.hbase.mapreduce.
> > LoadIncrementalHFiles in the directory specified by what
> > job.getWorkingDirectory()
> > returns, and I think I should delete these files immediately after I
> > have finished bulk loading data into HBase since they are useless at
> > that time, right ?
> >
> >
> >
> >
> > 2013/12/16 Bijieshan <bi...@huawei.com>
> >
> > > The reduce partition information is stored in this partition_XXXX file.
> > > See the below code:
> > >
> > > HFileOutputFormat#configureIncrementalLoad:
> > >         .....................
> > >     Path partitionsPath = new Path(job.getWorkingDirectory(),
> > >                                    "partitions_" + UUID.randomUUID());
> > >     LOG.info("Writing partition information to " + partitionsPath);
> > >
> > >     FileSystem fs = partitionsPath.getFileSystem(conf);
> > >     writePartitions(conf, partitionsPath, startKeys);
> > >         .....................
> > >
> > > Hoping it helps.
> > >
> > > Jieshan
> > > -----Original Message-----
> > > From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com]
> > > Sent: Monday, December 16, 2013 6:48 PM
> > > To: user@hbase.apache.org
> > > Subject: Why so many unexpected files like partitions_xxxx are created?
> > >
> > > I imported data into HBase in the fashion of bulk load,  but after
> > > that I found many unexpected file were created in the HDFS directory
> > > of /user/root/, and they like these:
> > >
> > > /user/root/partitions_fd74866b-6588-468d-8463-474e202db070
> > > /user/root/partitions_fd867cd2-d9c9-48f5-9eec-185b2e57788d
> > > /user/root/partitions_fda37b8a-a882-4787-babc-8310a969f85c
> > > /user/root/partitions_fdaca2f4-2792-41f6-b7e8-61a8a5677dea
> > > /user/root/partitions_fdd55baa-3a12-493e-8844-a23ae83209c5
> > > /user/root/partitions_fdd85a3c-9abe-45d4-a0c6-76d2bed88ea5
> > > /user/root/partitions_fe133460-5f3f-4c6a-9fff-ff6c62410cc1
> > > /user/root/partitions_fe29a2b0-b281-465f-8d4a-6044822d960a
> > > /user/root/partitions_fe2fa6fa-9066-484c-bc91-ec412e48d008
> > > /user/root/partitions_fe31667b-2d5a-452e-baf7-a81982fe954a
> > > /user/root/partitions_fe3a5542-bc4d-4137-9d5e-1a0c59f72ac3
> > > /user/root/partitions_fe6a9407-c27b-4a67-bb50-e6b9fd172bc9
> > > /user/root/partitions_fe6f9294-f970-473c-8659-c08292c27ddd
> > > ... ...
> > > ... ...
> > >
> > >
> > > It seems that they are HFiles, but I don't know why the were created
> > here?
> > >
> > > I bulk load data into HBase in the following way:
> > >
> > > Firstly,   I wrote a MapReduce program which only has map tasks. The
> map
> > > tasks read some text data and emit them in the form of  RowKey and
> > > KeyValue.The following is my program:
> > >
> > >         @Override
> > >         protected void map(NullWritable NULL, GtpcV1SignalWritable
> > > signal, Context ctx) throws InterruptedException, IOException {
> > >             String strRowkey = xxx;
> > >             byte[] rowkeyBytes = Bytes.toBytes(strRowkey);
> > >
> > >             rowkey.set(rowkeyBytes);
> > >
> > >             part1.init(signal);
> > >             part2.init(signal);
> > >
> > >             KeyValue kv = new KeyValue(rowkeyBytes, Family_A,
> > > Qualifier_Q, part1.serialize());
> > >             ctx.write(rowkey, kv);
> > >
> > >             kv = new KeyValue(rowkeyBytes, Family_B, Qualifier_Q,
> > > part2.serialize());
> > >             ctx.write(rowkey, kv);
> > >         }
> > >
> > >
> > > after the MR programs finished, there were several HFiles generated
> > > in the output directory I specified.
> > >
> > > Then I bean to load these HFiles into HBase using the following
> command:
> > >        hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
> > > HFiles-Dir  MyTable
> > >
> > > Finally , I could see that the data were indeed loaded into the
> > > table in HBase.
> > >
> > >
> > > But, I could also see that there were many unexpected files
> > > generated in the HDFS directory of  /user/root/,  just as I have
> > > mentioned at the begining of this mail,  and I did not specify any
> > > files to be produced in this directory.
> > >
> > > What happened ? Who can tell me what there files are and who
> > > produced
> > them?
> > >
> > > Thanks
> > >
> >
>

RE: Why so many unexpected files like partitions_xxxx are created?

Posted by Bijieshan <bi...@huawei.com>.

Yes, it should be cleaned up. But not included in current code in my understanding.

Jieshan.
-----Original Message-----
From: Ted Yu [mailto:yuzhihong@gmail.com] 
Sent: Tuesday, December 17, 2013 10:55 AM
To: user@hbase.apache.org
Subject: Re: Why so many unexpected files like partitions_xxxx are created?

Should bulk load task clean up partitions_xxxx upon completion ?

Cheers


On Mon, Dec 16, 2013 at 6:53 PM, Bijieshan <bi...@huawei.com> wrote:

> >  I think I should delete these files immediately after I have 
> > finished
> bulk loading data into HBase since they are useless at that time, right ?
>
> Ya. I think so. They are useless once bulk load task finished.
>
> Jieshan.
> -----Original Message-----
> From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com]
> Sent: Tuesday, December 17, 2013 9:34 AM
> To: user@hbase.apache.org
> Subject: Re: Why so many unexpected files like partitions_xxxx are created?
>
> Indeed these files are produced by org.apache.hadoop.hbase.mapreduce.
> LoadIncrementalHFiles in the directory specified by what
> job.getWorkingDirectory()
> returns, and I think I should delete these files immediately after I 
> have finished bulk loading data into HBase since they are useless at 
> that time, right ?
>
>
>
>
> 2013/12/16 Bijieshan <bi...@huawei.com>
>
> > The reduce partition information is stored in this partition_XXXX file.
> > See the below code:
> >
> > HFileOutputFormat#configureIncrementalLoad:
> >         .....................
> >     Path partitionsPath = new Path(job.getWorkingDirectory(),
> >                                    "partitions_" + UUID.randomUUID());
> >     LOG.info("Writing partition information to " + partitionsPath);
> >
> >     FileSystem fs = partitionsPath.getFileSystem(conf);
> >     writePartitions(conf, partitionsPath, startKeys);
> >         .....................
> >
> > Hoping it helps.
> >
> > Jieshan
> > -----Original Message-----
> > From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com]
> > Sent: Monday, December 16, 2013 6:48 PM
> > To: user@hbase.apache.org
> > Subject: Why so many unexpected files like partitions_xxxx are created?
> >
> > I imported data into HBase in the fashion of bulk load,  but after 
> > that I found many unexpected file were created in the HDFS directory 
> > of /user/root/, and they like these:
> >
> > /user/root/partitions_fd74866b-6588-468d-8463-474e202db070
> > /user/root/partitions_fd867cd2-d9c9-48f5-9eec-185b2e57788d
> > /user/root/partitions_fda37b8a-a882-4787-babc-8310a969f85c
> > /user/root/partitions_fdaca2f4-2792-41f6-b7e8-61a8a5677dea
> > /user/root/partitions_fdd55baa-3a12-493e-8844-a23ae83209c5
> > /user/root/partitions_fdd85a3c-9abe-45d4-a0c6-76d2bed88ea5
> > /user/root/partitions_fe133460-5f3f-4c6a-9fff-ff6c62410cc1
> > /user/root/partitions_fe29a2b0-b281-465f-8d4a-6044822d960a
> > /user/root/partitions_fe2fa6fa-9066-484c-bc91-ec412e48d008
> > /user/root/partitions_fe31667b-2d5a-452e-baf7-a81982fe954a
> > /user/root/partitions_fe3a5542-bc4d-4137-9d5e-1a0c59f72ac3
> > /user/root/partitions_fe6a9407-c27b-4a67-bb50-e6b9fd172bc9
> > /user/root/partitions_fe6f9294-f970-473c-8659-c08292c27ddd
> > ... ...
> > ... ...
> >
> >
> > It seems that they are HFiles, but I don't know why the were created
> here?
> >
> > I bulk load data into HBase in the following way:
> >
> > Firstly,   I wrote a MapReduce program which only has map tasks. The map
> > tasks read some text data and emit them in the form of  RowKey and 
> > KeyValue.The following is my program:
> >
> >         @Override
> >         protected void map(NullWritable NULL, GtpcV1SignalWritable 
> > signal, Context ctx) throws InterruptedException, IOException {
> >             String strRowkey = xxx;
> >             byte[] rowkeyBytes = Bytes.toBytes(strRowkey);
> >
> >             rowkey.set(rowkeyBytes);
> >
> >             part1.init(signal);
> >             part2.init(signal);
> >
> >             KeyValue kv = new KeyValue(rowkeyBytes, Family_A, 
> > Qualifier_Q, part1.serialize());
> >             ctx.write(rowkey, kv);
> >
> >             kv = new KeyValue(rowkeyBytes, Family_B, Qualifier_Q, 
> > part2.serialize());
> >             ctx.write(rowkey, kv);
> >         }
> >
> >
> > after the MR programs finished, there were several HFiles generated 
> > in the output directory I specified.
> >
> > Then I bean to load these HFiles into HBase using the following command:
> >        hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
> > HFiles-Dir  MyTable
> >
> > Finally , I could see that the data were indeed loaded into the 
> > table in HBase.
> >
> >
> > But, I could also see that there were many unexpected files 
> > generated in the HDFS directory of  /user/root/,  just as I have 
> > mentioned at the begining of this mail,  and I did not specify any 
> > files to be produced in this directory.
> >
> > What happened ? Who can tell me what there files are and who 
> > produced
> them?
> >
> > Thanks
> >
>

Re: Why so many unexpected files like partitions_xxxx are created?

Posted by Ted Yu <yu...@gmail.com>.

Should bulk load task clean up partitions_xxxx upon completion ?

Cheers


On Mon, Dec 16, 2013 at 6:53 PM, Bijieshan <bi...@huawei.com> wrote:

> >  I think I should delete these files immediately after I have finished
> bulk loading data into HBase since they are useless at that time, right ?
>
> Ya. I think so. They are useless once bulk load task finished.
>
> Jieshan.
> -----Original Message-----
> From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com]
> Sent: Tuesday, December 17, 2013 9:34 AM
> To: user@hbase.apache.org
> Subject: Re: Why so many unexpected files like partitions_xxxx are created?
>
> Indeed these files are produced by org.apache.hadoop.hbase.mapreduce.
> LoadIncrementalHFiles in the directory specified by what
> job.getWorkingDirectory()
> returns, and I think I should delete these files immediately after I have
> finished bulk loading data into HBase since they are useless at that time,
> right ?
>
>
>
>
> 2013/12/16 Bijieshan <bi...@huawei.com>
>
> > The reduce partition information is stored in this partition_XXXX file.
> > See the below code:
> >
> > HFileOutputFormat#configureIncrementalLoad:
> >         .....................
> >     Path partitionsPath = new Path(job.getWorkingDirectory(),
> >                                    "partitions_" + UUID.randomUUID());
> >     LOG.info("Writing partition information to " + partitionsPath);
> >
> >     FileSystem fs = partitionsPath.getFileSystem(conf);
> >     writePartitions(conf, partitionsPath, startKeys);
> >         .....................
> >
> > Hoping it helps.
> >
> > Jieshan
> > -----Original Message-----
> > From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com]
> > Sent: Monday, December 16, 2013 6:48 PM
> > To: user@hbase.apache.org
> > Subject: Why so many unexpected files like partitions_xxxx are created?
> >
> > I imported data into HBase in the fashion of bulk load,  but after
> > that I found many unexpected file were created in the HDFS directory
> > of /user/root/, and they like these:
> >
> > /user/root/partitions_fd74866b-6588-468d-8463-474e202db070
> > /user/root/partitions_fd867cd2-d9c9-48f5-9eec-185b2e57788d
> > /user/root/partitions_fda37b8a-a882-4787-babc-8310a969f85c
> > /user/root/partitions_fdaca2f4-2792-41f6-b7e8-61a8a5677dea
> > /user/root/partitions_fdd55baa-3a12-493e-8844-a23ae83209c5
> > /user/root/partitions_fdd85a3c-9abe-45d4-a0c6-76d2bed88ea5
> > /user/root/partitions_fe133460-5f3f-4c6a-9fff-ff6c62410cc1
> > /user/root/partitions_fe29a2b0-b281-465f-8d4a-6044822d960a
> > /user/root/partitions_fe2fa6fa-9066-484c-bc91-ec412e48d008
> > /user/root/partitions_fe31667b-2d5a-452e-baf7-a81982fe954a
> > /user/root/partitions_fe3a5542-bc4d-4137-9d5e-1a0c59f72ac3
> > /user/root/partitions_fe6a9407-c27b-4a67-bb50-e6b9fd172bc9
> > /user/root/partitions_fe6f9294-f970-473c-8659-c08292c27ddd
> > ... ...
> > ... ...
> >
> >
> > It seems that they are HFiles, but I don't know why the were created
> here?
> >
> > I bulk load data into HBase in the following way:
> >
> > Firstly,   I wrote a MapReduce program which only has map tasks. The map
> > tasks read some text data and emit them in the form of  RowKey and
> > KeyValue.The following is my program:
> >
> >         @Override
> >         protected void map(NullWritable NULL, GtpcV1SignalWritable
> > signal, Context ctx) throws InterruptedException, IOException {
> >             String strRowkey = xxx;
> >             byte[] rowkeyBytes = Bytes.toBytes(strRowkey);
> >
> >             rowkey.set(rowkeyBytes);
> >
> >             part1.init(signal);
> >             part2.init(signal);
> >
> >             KeyValue kv = new KeyValue(rowkeyBytes, Family_A,
> > Qualifier_Q, part1.serialize());
> >             ctx.write(rowkey, kv);
> >
> >             kv = new KeyValue(rowkeyBytes, Family_B, Qualifier_Q,
> > part2.serialize());
> >             ctx.write(rowkey, kv);
> >         }
> >
> >
> > after the MR programs finished, there were several HFiles generated in
> > the output directory I specified.
> >
> > Then I bean to load these HFiles into HBase using the following command:
> >        hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
> > HFiles-Dir  MyTable
> >
> > Finally , I could see that the data were indeed loaded into the table
> > in HBase.
> >
> >
> > But, I could also see that there were many unexpected files generated
> > in the HDFS directory of  /user/root/,  just as I have mentioned at
> > the begining of this mail,  and I did not specify any files to be
> > produced in this directory.
> >
> > What happened ? Who can tell me what there files are and who produced
> them?
> >
> > Thanks
> >
>

RE: Why so many unexpected files like partitions_xxxx are created?

Posted by Bijieshan <bi...@huawei.com>.

>  I think I should delete these files immediately after I have finished bulk loading data into HBase since they are useless at that time, right ?

Ya. I think so. They are useless once bulk load task finished.

Jieshan.
-----Original Message-----
From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com] 
Sent: Tuesday, December 17, 2013 9:34 AM
To: user@hbase.apache.org
Subject: Re: Why so many unexpected files like partitions_xxxx are created?

Indeed these files are produced by org.apache.hadoop.hbase.mapreduce.
LoadIncrementalHFiles in the directory specified by what
job.getWorkingDirectory()
returns, and I think I should delete these files immediately after I have finished bulk loading data into HBase since they are useless at that time, right ?




2013/12/16 Bijieshan <bi...@huawei.com>

> The reduce partition information is stored in this partition_XXXX file.
> See the below code:
>
> HFileOutputFormat#configureIncrementalLoad:
>         .....................
>     Path partitionsPath = new Path(job.getWorkingDirectory(),
>                                    "partitions_" + UUID.randomUUID());
>     LOG.info("Writing partition information to " + partitionsPath);
>
>     FileSystem fs = partitionsPath.getFileSystem(conf);
>     writePartitions(conf, partitionsPath, startKeys);
>         .....................
>
> Hoping it helps.
>
> Jieshan
> -----Original Message-----
> From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com]
> Sent: Monday, December 16, 2013 6:48 PM
> To: user@hbase.apache.org
> Subject: Why so many unexpected files like partitions_xxxx are created?
>
> I imported data into HBase in the fashion of bulk load,  but after 
> that I found many unexpected file were created in the HDFS directory 
> of /user/root/, and they like these:
>
> /user/root/partitions_fd74866b-6588-468d-8463-474e202db070
> /user/root/partitions_fd867cd2-d9c9-48f5-9eec-185b2e57788d
> /user/root/partitions_fda37b8a-a882-4787-babc-8310a969f85c
> /user/root/partitions_fdaca2f4-2792-41f6-b7e8-61a8a5677dea
> /user/root/partitions_fdd55baa-3a12-493e-8844-a23ae83209c5
> /user/root/partitions_fdd85a3c-9abe-45d4-a0c6-76d2bed88ea5
> /user/root/partitions_fe133460-5f3f-4c6a-9fff-ff6c62410cc1
> /user/root/partitions_fe29a2b0-b281-465f-8d4a-6044822d960a
> /user/root/partitions_fe2fa6fa-9066-484c-bc91-ec412e48d008
> /user/root/partitions_fe31667b-2d5a-452e-baf7-a81982fe954a
> /user/root/partitions_fe3a5542-bc4d-4137-9d5e-1a0c59f72ac3
> /user/root/partitions_fe6a9407-c27b-4a67-bb50-e6b9fd172bc9
> /user/root/partitions_fe6f9294-f970-473c-8659-c08292c27ddd
> ... ...
> ... ...
>
>
> It seems that they are HFiles, but I don't know why the were created here?
>
> I bulk load data into HBase in the following way:
>
> Firstly,   I wrote a MapReduce program which only has map tasks. The map
> tasks read some text data and emit them in the form of  RowKey and 
> KeyValue.The following is my program:
>
>         @Override
>         protected void map(NullWritable NULL, GtpcV1SignalWritable 
> signal, Context ctx) throws InterruptedException, IOException {
>             String strRowkey = xxx;
>             byte[] rowkeyBytes = Bytes.toBytes(strRowkey);
>
>             rowkey.set(rowkeyBytes);
>
>             part1.init(signal);
>             part2.init(signal);
>
>             KeyValue kv = new KeyValue(rowkeyBytes, Family_A, 
> Qualifier_Q, part1.serialize());
>             ctx.write(rowkey, kv);
>
>             kv = new KeyValue(rowkeyBytes, Family_B, Qualifier_Q, 
> part2.serialize());
>             ctx.write(rowkey, kv);
>         }
>
>
> after the MR programs finished, there were several HFiles generated in 
> the output directory I specified.
>
> Then I bean to load these HFiles into HBase using the following command:
>        hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
> HFiles-Dir  MyTable
>
> Finally , I could see that the data were indeed loaded into the table 
> in HBase.
>
>
> But, I could also see that there were many unexpected files generated 
> in the HDFS directory of  /user/root/,  just as I have mentioned at 
> the begining of this mail,  and I did not specify any files to be 
> produced in this directory.
>
> What happened ? Who can tell me what there files are and who produced them?
>
> Thanks
>

Re: Why so many unexpected files like partitions_xxxx are created?

Posted by Tao Xiao <xi...@gmail.com>.

Indeed these files are produced by org.apache.hadoop.hbase.mapreduce.
LoadIncrementalHFiles in the directory specified by what
job.getWorkingDirectory()
returns, and I think I should delete these files immediately after I have
finished bulk loading data into HBase since they are useless at that time,
right ?




2013/12/16 Bijieshan <bi...@huawei.com>

> The reduce partition information is stored in this partition_XXXX file.
> See the below code:
>
> HFileOutputFormat#configureIncrementalLoad:
>         .....................
>     Path partitionsPath = new Path(job.getWorkingDirectory(),
>                                    "partitions_" + UUID.randomUUID());
>     LOG.info("Writing partition information to " + partitionsPath);
>
>     FileSystem fs = partitionsPath.getFileSystem(conf);
>     writePartitions(conf, partitionsPath, startKeys);
>         .....................
>
> Hoping it helps.
>
> Jieshan
> -----Original Message-----
> From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com]
> Sent: Monday, December 16, 2013 6:48 PM
> To: user@hbase.apache.org
> Subject: Why so many unexpected files like partitions_xxxx are created?
>
> I imported data into HBase in the fashion of bulk load,  but after that I
> found many unexpected file were created in the HDFS directory of
> /user/root/, and they like these:
>
> /user/root/partitions_fd74866b-6588-468d-8463-474e202db070
> /user/root/partitions_fd867cd2-d9c9-48f5-9eec-185b2e57788d
> /user/root/partitions_fda37b8a-a882-4787-babc-8310a969f85c
> /user/root/partitions_fdaca2f4-2792-41f6-b7e8-61a8a5677dea
> /user/root/partitions_fdd55baa-3a12-493e-8844-a23ae83209c5
> /user/root/partitions_fdd85a3c-9abe-45d4-a0c6-76d2bed88ea5
> /user/root/partitions_fe133460-5f3f-4c6a-9fff-ff6c62410cc1
> /user/root/partitions_fe29a2b0-b281-465f-8d4a-6044822d960a
> /user/root/partitions_fe2fa6fa-9066-484c-bc91-ec412e48d008
> /user/root/partitions_fe31667b-2d5a-452e-baf7-a81982fe954a
> /user/root/partitions_fe3a5542-bc4d-4137-9d5e-1a0c59f72ac3
> /user/root/partitions_fe6a9407-c27b-4a67-bb50-e6b9fd172bc9
> /user/root/partitions_fe6f9294-f970-473c-8659-c08292c27ddd
> ... ...
> ... ...
>
>
> It seems that they are HFiles, but I don't know why the were created here?
>
> I bulk load data into HBase in the following way:
>
> Firstly,   I wrote a MapReduce program which only has map tasks. The map
> tasks read some text data and emit them in the form of  RowKey and
> KeyValue.The following is my program:
>
>         @Override
>         protected void map(NullWritable NULL, GtpcV1SignalWritable signal,
> Context ctx) throws InterruptedException, IOException {
>             String strRowkey = xxx;
>             byte[] rowkeyBytes = Bytes.toBytes(strRowkey);
>
>             rowkey.set(rowkeyBytes);
>
>             part1.init(signal);
>             part2.init(signal);
>
>             KeyValue kv = new KeyValue(rowkeyBytes, Family_A, Qualifier_Q,
> part1.serialize());
>             ctx.write(rowkey, kv);
>
>             kv = new KeyValue(rowkeyBytes, Family_B, Qualifier_Q,
> part2.serialize());
>             ctx.write(rowkey, kv);
>         }
>
>
> after the MR programs finished, there were several HFiles generated in the
> output directory I specified.
>
> Then I bean to load these HFiles into HBase using the following command:
>        hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
> HFiles-Dir  MyTable
>
> Finally , I could see that the data were indeed loaded into the table in
> HBase.
>
>
> But, I could also see that there were many unexpected files generated in
> the HDFS directory of  /user/root/,  just as I have mentioned at the
> begining of this mail,  and I did not specify any files to be produced in
> this directory.
>
> What happened ? Who can tell me what there files are and who produced them?
>
> Thanks
>

RE: Why so many unexpected files like partitions_xxxx are created?

Posted by Bijieshan <bi...@huawei.com>.

The reduce partition information is stored in this partition_XXXX file. See the below code:

HFileOutputFormat#configureIncrementalLoad:
	.....................
    Path partitionsPath = new Path(job.getWorkingDirectory(),
                                   "partitions_" + UUID.randomUUID());
    LOG.info("Writing partition information to " + partitionsPath);

    FileSystem fs = partitionsPath.getFileSystem(conf);
    writePartitions(conf, partitionsPath, startKeys);
	.....................

Hoping it helps.

Jieshan
-----Original Message-----
From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com] 
Sent: Monday, December 16, 2013 6:48 PM
To: user@hbase.apache.org
Subject: Why so many unexpected files like partitions_xxxx are created?

I imported data into HBase in the fashion of bulk load,  but after that I found many unexpected file were created in the HDFS directory of /user/root/, and they like these:

/user/root/partitions_fd74866b-6588-468d-8463-474e202db070
/user/root/partitions_fd867cd2-d9c9-48f5-9eec-185b2e57788d
/user/root/partitions_fda37b8a-a882-4787-babc-8310a969f85c
/user/root/partitions_fdaca2f4-2792-41f6-b7e8-61a8a5677dea
/user/root/partitions_fdd55baa-3a12-493e-8844-a23ae83209c5
/user/root/partitions_fdd85a3c-9abe-45d4-a0c6-76d2bed88ea5
/user/root/partitions_fe133460-5f3f-4c6a-9fff-ff6c62410cc1
/user/root/partitions_fe29a2b0-b281-465f-8d4a-6044822d960a
/user/root/partitions_fe2fa6fa-9066-484c-bc91-ec412e48d008
/user/root/partitions_fe31667b-2d5a-452e-baf7-a81982fe954a
/user/root/partitions_fe3a5542-bc4d-4137-9d5e-1a0c59f72ac3
/user/root/partitions_fe6a9407-c27b-4a67-bb50-e6b9fd172bc9
/user/root/partitions_fe6f9294-f970-473c-8659-c08292c27ddd
... ...
... ...


It seems that they are HFiles, but I don't know why the were created here?

I bulk load data into HBase in the following way:

Firstly,   I wrote a MapReduce program which only has map tasks. The map
tasks read some text data and emit them in the form of  RowKey and KeyValue.The following is my program:

        @Override
        protected void map(NullWritable NULL, GtpcV1SignalWritable signal, Context ctx) throws InterruptedException, IOException {
            String strRowkey = xxx;
            byte[] rowkeyBytes = Bytes.toBytes(strRowkey);

            rowkey.set(rowkeyBytes);

            part1.init(signal);
            part2.init(signal);

            KeyValue kv = new KeyValue(rowkeyBytes, Family_A, Qualifier_Q, part1.serialize());
            ctx.write(rowkey, kv);

            kv = new KeyValue(rowkeyBytes, Family_B, Qualifier_Q, part2.serialize());
            ctx.write(rowkey, kv);
        }


after the MR programs finished, there were several HFiles generated in the output directory I specified.

Then I bean to load these HFiles into HBase using the following command:
       hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
HFiles-Dir  MyTable

Finally , I could see that the data were indeed loaded into the table in HBase.


But, I could also see that there were many unexpected files generated in the HDFS directory of  /user/root/,  just as I have mentioned at the begining of this mail,  and I did not specify any files to be produced in this directory.

What happened ? Who can tell me what there files are and who produced them?

Thanks