You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Cam Bazz <ca...@gmail.com> on 2011/02/12 01:38:22 UTC

error out of all sudden

Hello,

I set up my one node pseudo distributed system, left with a cronjob,
copying data from a remote server and loading them to hadoop, and
doing some calculations per hour.

It stopped working today, giving me this error. I deleted everything,
and made it reprocess from beginning, and i still get the same error
same place.

is there a limit, on how many partitions there can be in a table?

so, I tried for couple of hours solving the problem, but now my hive
fun is over...

any ideas as to why this might be happening, or what should i do tring
to debug it.

best regards,
-c.b.


11/02/12 01:27:47 INFO ql.Driver: Starting command: load data local
inpath '/var/mylog/hourly/log.CAT.2011021119' into table cat_raw
partition(date_hour=2011021119)
Copying data from file:/var/mylog/hourly/log.CAT.2011021119

11/02/12 01:27:47 INFO exec.CopyTask: Copying data from
file:/var/mylog/hourly/log.CAT.2011021119 to
hdfs://darkstar:9000/tmp/hive-cam/hive_2011-02-12_01-27-47_415_7165217842693560517/10000

11/02/12 01:27:47 INFO hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException
11/02/12 01:27:47 INFO hdfs.DFSClient: Abandoning block
blk_6275225343572661963_1859
11/02/12 01:27:53 INFO hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException
11/02/12 01:27:53 INFO hdfs.DFSClient: Abandoning block
blk_2673116090916206836_1859
11/02/12 01:27:59 INFO hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException
11/02/12 01:27:59 INFO hdfs.DFSClient: Abandoning block
blk_5414825878079983460_1859
11/02/12 01:28:05 INFO hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException
11/02/12 01:28:05 INFO hdfs.DFSClient: Abandoning block
blk_6043862611357349730_1859
11/02/12 01:28:11 WARN hdfs.DFSClient: DataStreamer Exception:
java.io.IOException: Unable to create new block.
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2845)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)

11/02/12 01:28:11 WARN hdfs.DFSClient: Error Recovery for block
blk_6043862611357349730_1859 bad datanode[0] nodes == null
11/02/12 01:28:11 WARN hdfs.DFSClient: Could not get block locations.
Source file "/tmp/hive-cam/hive_2011-02-12_01-27-47_415_7165217842693560517/10000/log.CAT.2011021119"
- Aborting...
Failed with exception null
11/02/12 01:28:11 ERROR exec.CopyTask: Failed with exception null
java.io.EOFException
        at java.io.DataInputStream.readByte(DataInputStream.java:250)
        at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
        at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
        at org.apache.hadoop.io.Text.readString(Text.java:400)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2901)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)

Re: error out of all sudden

Posted by Vinithra Varadharajan <vi...@cloudera.com>.
Viraj,

Regarding file handles, you were most likely running into this:
https://issues.apache.org/jira/browse/HIVE-1508

<https://issues.apache.org/jira/browse/HIVE-1508>-Vinithra

On Fri, Feb 11, 2011 at 7:08 PM, Viral Bajaria <vi...@gmail.com>wrote:

> are you running out of open file handles ? you should look into that
> because you are running everything on one node. you should look at your
> namenode/datanode logs to make sure that's not the case.
>
> the error sounds like a BlockException on HDFS and the hive copyTask is
> just moving the file from your LOCAL mount point to an tmp hdfs location
> before it loads into the correct partition.
>
> I have emailed the group before but with no luck in getting a reply, my
> script which continuously loads files into hive uses a lot of file handles.
> I don't know if it's hive that leaves a file handle open or if it's some
> other process. my script does not run on the same box so it's definitely not
> my script that is holding onto file handles.
>
> -Viral
>
> On Fri, Feb 11, 2011 at 5:20 PM, Cam Bazz <ca...@gmail.com> wrote:
>
>> yes, i have a lot of small files. this is because i wanted to process
>> hourly instead of daily.
>>
>> i will be checking into whether this is the case, i now am re-running
>> the process, and I see
>>
>> 332 files and directories, 231 blocks = 563 total. Heap Size is 119.88
>> MB / 910.25 MB (13%)
>> Configured Capacity     :       140.72 GB
>> DFS Used        :       6.63 MB
>> Non DFS Used    :       8.76 GB
>> DFS Remaining   :       131.95 GB
>> DFS Used%       :       0 %
>> DFS Remaining%  :       93.77 %
>>
>> I do not think this is the case, but I will be monitoring, and will
>> see in half an hour.
>>
>> best regards, and thanks a bunch.
>>
>> -cam
>>
>>
>>
>> On Sat, Feb 12, 2011 at 3:00 AM, Christopher, Pat
>> <pa...@hp.com> wrote:
>> > If you're running with the defaults I think its around 20gb.  If you're
>> processing a couple hundred MBs you could easily hit this limit between
>> desired outputs and any intermediate files created.  HDFS allocates the
>> available space in blocks so if you have a lot of small files, you'll run
>> out of blocks before you run out space.  This is one reason why HDFS/hadoop
>> is 'bad' for dealing with lots of small files.
>> >
>> > You can check here:  localhost:50070 that's the web page for your hdfs
>> namenode.  It has status information on your hdfs including size.
>> >
>> > Pat
>> >
>> > -----Original Message-----
>> > From: Cam Bazz [mailto:cambazz@gmail.com]
>> > Sent: Friday, February 11, 2011 4:55 PM
>> > To: user@hive.apache.org
>> > Subject: Re: error out of all sudden
>> >
>> > but is there a ridiculously low default for hdfs space limits? I
>> > looked everywhere in the configuration files, but could not find
>> > anything that limits the size of hdfs
>> >
>> > i think i am running on a 150GB hard drive, and the data I am
>> > processing is in amounts of couple of hundred of megabytes at max.
>> >
>> > best regards,
>> >
>> > -cam
>> >
>> >
>> >
>> > On Sat, Feb 12, 2011 at 2:44 AM, Christopher, Pat
>> > <pa...@hp.com> wrote:
>> >> Is your hdfs hitting its space limits?
>> >>
>> >> Pat
>> >>
>> >> -----Original Message-----
>> >> From: Cam Bazz [mailto:cambazz@gmail.com]
>> >> Sent: Friday, February 11, 2011 4:38 PM
>> >> To: user@hive.apache.org
>> >> Subject: error out of all sudden
>> >>
>> >> Hello,
>> >>
>> >> I set up my one node pseudo distributed system, left with a cronjob,
>> >> copying data from a remote server and loading them to hadoop, and
>> >> doing some calculations per hour.
>> >>
>> >> It stopped working today, giving me this error. I deleted everything,
>> >> and made it reprocess from beginning, and i still get the same error
>> >> same place.
>> >>
>> >> is there a limit, on how many partitions there can be in a table?
>> >>
>> >> so, I tried for couple of hours solving the problem, but now my hive
>> >> fun is over...
>> >>
>> >> any ideas as to why this might be happening, or what should i do tring
>> >> to debug it.
>> >>
>> >> best regards,
>> >> -c.b.
>> >>
>> >>
>> >> 11/02/12 01:27:47 INFO ql.Driver: Starting command: load data local
>> >> inpath '/var/mylog/hourly/log.CAT.2011021119' into table cat_raw
>> >> partition(date_hour=2011021119)
>> >> Copying data from file:/var/mylog/hourly/log.CAT.2011021119
>> >>
>> >> 11/02/12 01:27:47 INFO exec.CopyTask: Copying data from
>> >> file:/var/mylog/hourly/log.CAT.2011021119 to
>> >>
>> hdfs://darkstar:9000/tmp/hive-cam/hive_2011-02-12_01-27-47_415_7165217842693560517/10000
>> >>
>> >> 11/02/12 01:27:47 INFO hdfs.DFSClient: Exception in
>> >> createBlockOutputStream java.io.EOFException
>> >> 11/02/12 01:27:47 INFO hdfs.DFSClient: Abandoning block
>> >> blk_6275225343572661963_1859
>> >> 11/02/12 01:27:53 INFO hdfs.DFSClient: Exception in
>> >> createBlockOutputStream java.io.EOFException
>> >> 11/02/12 01:27:53 INFO hdfs.DFSClient: Abandoning block
>> >> blk_2673116090916206836_1859
>> >> 11/02/12 01:27:59 INFO hdfs.DFSClient: Exception in
>> >> createBlockOutputStream java.io.EOFException
>> >> 11/02/12 01:27:59 INFO hdfs.DFSClient: Abandoning block
>> >> blk_5414825878079983460_1859
>> >> 11/02/12 01:28:05 INFO hdfs.DFSClient: Exception in
>> >> createBlockOutputStream java.io.EOFException
>> >> 11/02/12 01:28:05 INFO hdfs.DFSClient: Abandoning block
>> >> blk_6043862611357349730_1859
>> >> 11/02/12 01:28:11 WARN hdfs.DFSClient: DataStreamer Exception:
>> >> java.io.IOException: Unable to create new block.
>> >>        at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2845)
>> >>        at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
>> >>        at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
>> >>
>> >> 11/02/12 01:28:11 WARN hdfs.DFSClient: Error Recovery for block
>> >> blk_6043862611357349730_1859 bad datanode[0] nodes == null
>> >> 11/02/12 01:28:11 WARN hdfs.DFSClient: Could not get block locations.
>> >> Source file
>> "/tmp/hive-cam/hive_2011-02-12_01-27-47_415_7165217842693560517/10000/log.CAT.2011021119"
>> >> - Aborting...
>> >> Failed with exception null
>> >> 11/02/12 01:28:11 ERROR exec.CopyTask: Failed with exception null
>> >> java.io.EOFException
>> >>        at java.io.DataInputStream.readByte(DataInputStream.java:250)
>> >>        at
>> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
>> >>        at
>> org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
>> >>        at org.apache.hadoop.io.Text.readString(Text.java:400)
>> >>        at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2901)
>> >>        at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826)
>> >>        at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
>> >>        at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
>> >>
>> >
>>
>
>

Re: error out of all sudden

Posted by Cam Bazz <ca...@gmail.com>.
yes I am runnning on a single node, pseudo distributed mode.

i have increased the file handles, epoll interval, and lastly dfs max
xcievers was increased to 1024

it wont give errors any more.

best regards,
-c.b.

On Sat, Feb 12, 2011 at 5:08 AM, Viral Bajaria <vi...@gmail.com> wrote:
> are you running out of open file handles ? you should look into that because
> you are running everything on one node. you should look at your
> namenode/datanode logs to make sure that's not the case.
> the error sounds like a BlockException on HDFS and the hive copyTask is just
> moving the file from your LOCAL mount point to an tmp hdfs location before
> it loads into the correct partition.
> I have emailed the group before but with no luck in getting a reply, my
> script which continuously loads files into hive uses a lot of file handles.
> I don't know if it's hive that leaves a file handle open or if it's some
> other process. my script does not run on the same box so it's definitely not
> my script that is holding onto file handles.
> -Viral
> On Fri, Feb 11, 2011 at 5:20 PM, Cam Bazz <ca...@gmail.com> wrote:
>>
>> yes, i have a lot of small files. this is because i wanted to process
>> hourly instead of daily.
>>
>> i will be checking into whether this is the case, i now am re-running
>> the process, and I see
>>
>> 332 files and directories, 231 blocks = 563 total. Heap Size is 119.88
>> MB / 910.25 MB (13%)
>> Configured Capacity     :       140.72 GB
>> DFS Used        :       6.63 MB
>> Non DFS Used    :       8.76 GB
>> DFS Remaining   :       131.95 GB
>> DFS Used%       :       0 %
>> DFS Remaining%  :       93.77 %
>>
>> I do not think this is the case, but I will be monitoring, and will
>> see in half an hour.
>>
>> best regards, and thanks a bunch.
>>
>> -cam
>>
>>
>>
>> On Sat, Feb 12, 2011 at 3:00 AM, Christopher, Pat
>> <pa...@hp.com> wrote:
>> > If you're running with the defaults I think its around 20gb.  If you're
>> > processing a couple hundred MBs you could easily hit this limit between
>> > desired outputs and any intermediate files created.  HDFS allocates the
>> > available space in blocks so if you have a lot of small files, you'll run
>> > out of blocks before you run out space.  This is one reason why HDFS/hadoop
>> > is 'bad' for dealing with lots of small files.
>> >
>> > You can check here:  localhost:50070 that's the web page for your hdfs
>> > namenode.  It has status information on your hdfs including size.
>> >
>> > Pat
>> >
>> > -----Original Message-----
>> > From: Cam Bazz [mailto:cambazz@gmail.com]
>> > Sent: Friday, February 11, 2011 4:55 PM
>> > To: user@hive.apache.org
>> > Subject: Re: error out of all sudden
>> >
>> > but is there a ridiculously low default for hdfs space limits? I
>> > looked everywhere in the configuration files, but could not find
>> > anything that limits the size of hdfs
>> >
>> > i think i am running on a 150GB hard drive, and the data I am
>> > processing is in amounts of couple of hundred of megabytes at max.
>> >
>> > best regards,
>> >
>> > -cam
>> >
>> >
>> >
>> > On Sat, Feb 12, 2011 at 2:44 AM, Christopher, Pat
>> > <pa...@hp.com> wrote:
>> >> Is your hdfs hitting its space limits?
>> >>
>> >> Pat
>> >>
>> >> -----Original Message-----
>> >> From: Cam Bazz [mailto:cambazz@gmail.com]
>> >> Sent: Friday, February 11, 2011 4:38 PM
>> >> To: user@hive.apache.org
>> >> Subject: error out of all sudden
>> >>
>> >> Hello,
>> >>
>> >> I set up my one node pseudo distributed system, left with a cronjob,
>> >> copying data from a remote server and loading them to hadoop, and
>> >> doing some calculations per hour.
>> >>
>> >> It stopped working today, giving me this error. I deleted everything,
>> >> and made it reprocess from beginning, and i still get the same error
>> >> same place.
>> >>
>> >> is there a limit, on how many partitions there can be in a table?
>> >>
>> >> so, I tried for couple of hours solving the problem, but now my hive
>> >> fun is over...
>> >>
>> >> any ideas as to why this might be happening, or what should i do tring
>> >> to debug it.
>> >>
>> >> best regards,
>> >> -c.b.
>> >>
>> >>
>> >> 11/02/12 01:27:47 INFO ql.Driver: Starting command: load data local
>> >> inpath '/var/mylog/hourly/log.CAT.2011021119' into table cat_raw
>> >> partition(date_hour=2011021119)
>> >> Copying data from file:/var/mylog/hourly/log.CAT.2011021119
>> >>
>> >> 11/02/12 01:27:47 INFO exec.CopyTask: Copying data from
>> >> file:/var/mylog/hourly/log.CAT.2011021119 to
>> >>
>> >> hdfs://darkstar:9000/tmp/hive-cam/hive_2011-02-12_01-27-47_415_7165217842693560517/10000
>> >>
>> >> 11/02/12 01:27:47 INFO hdfs.DFSClient: Exception in
>> >> createBlockOutputStream java.io.EOFException
>> >> 11/02/12 01:27:47 INFO hdfs.DFSClient: Abandoning block
>> >> blk_6275225343572661963_1859
>> >> 11/02/12 01:27:53 INFO hdfs.DFSClient: Exception in
>> >> createBlockOutputStream java.io.EOFException
>> >> 11/02/12 01:27:53 INFO hdfs.DFSClient: Abandoning block
>> >> blk_2673116090916206836_1859
>> >> 11/02/12 01:27:59 INFO hdfs.DFSClient: Exception in
>> >> createBlockOutputStream java.io.EOFException
>> >> 11/02/12 01:27:59 INFO hdfs.DFSClient: Abandoning block
>> >> blk_5414825878079983460_1859
>> >> 11/02/12 01:28:05 INFO hdfs.DFSClient: Exception in
>> >> createBlockOutputStream java.io.EOFException
>> >> 11/02/12 01:28:05 INFO hdfs.DFSClient: Abandoning block
>> >> blk_6043862611357349730_1859
>> >> 11/02/12 01:28:11 WARN hdfs.DFSClient: DataStreamer Exception:
>> >> java.io.IOException: Unable to create new block.
>> >>        at
>> >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2845)
>> >>        at
>> >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
>> >>        at
>> >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
>> >>
>> >> 11/02/12 01:28:11 WARN hdfs.DFSClient: Error Recovery for block
>> >> blk_6043862611357349730_1859 bad datanode[0] nodes == null
>> >> 11/02/12 01:28:11 WARN hdfs.DFSClient: Could not get block locations.
>> >> Source file
>> >> "/tmp/hive-cam/hive_2011-02-12_01-27-47_415_7165217842693560517/10000/log.CAT.2011021119"
>> >> - Aborting...
>> >> Failed with exception null
>> >> 11/02/12 01:28:11 ERROR exec.CopyTask: Failed with exception null
>> >> java.io.EOFException
>> >>        at java.io.DataInputStream.readByte(DataInputStream.java:250)
>> >>        at
>> >> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
>> >>        at
>> >> org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
>> >>        at org.apache.hadoop.io.Text.readString(Text.java:400)
>> >>        at
>> >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2901)
>> >>        at
>> >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826)
>> >>        at
>> >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
>> >>        at
>> >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
>> >>
>> >
>
>

Re: error out of all sudden

Posted by Viral Bajaria <vi...@gmail.com>.
are you running out of open file handles ? you should look into that because
you are running everything on one node. you should look at your
namenode/datanode logs to make sure that's not the case.

the error sounds like a BlockException on HDFS and the hive copyTask is just
moving the file from your LOCAL mount point to an tmp hdfs location before
it loads into the correct partition.

I have emailed the group before but with no luck in getting a reply, my
script which continuously loads files into hive uses a lot of file handles.
I don't know if it's hive that leaves a file handle open or if it's some
other process. my script does not run on the same box so it's definitely not
my script that is holding onto file handles.

-Viral

On Fri, Feb 11, 2011 at 5:20 PM, Cam Bazz <ca...@gmail.com> wrote:

> yes, i have a lot of small files. this is because i wanted to process
> hourly instead of daily.
>
> i will be checking into whether this is the case, i now am re-running
> the process, and I see
>
> 332 files and directories, 231 blocks = 563 total. Heap Size is 119.88
> MB / 910.25 MB (13%)
> Configured Capacity     :       140.72 GB
> DFS Used        :       6.63 MB
> Non DFS Used    :       8.76 GB
> DFS Remaining   :       131.95 GB
> DFS Used%       :       0 %
> DFS Remaining%  :       93.77 %
>
> I do not think this is the case, but I will be monitoring, and will
> see in half an hour.
>
> best regards, and thanks a bunch.
>
> -cam
>
>
>
> On Sat, Feb 12, 2011 at 3:00 AM, Christopher, Pat
> <pa...@hp.com> wrote:
> > If you're running with the defaults I think its around 20gb.  If you're
> processing a couple hundred MBs you could easily hit this limit between
> desired outputs and any intermediate files created.  HDFS allocates the
> available space in blocks so if you have a lot of small files, you'll run
> out of blocks before you run out space.  This is one reason why HDFS/hadoop
> is 'bad' for dealing with lots of small files.
> >
> > You can check here:  localhost:50070 that's the web page for your hdfs
> namenode.  It has status information on your hdfs including size.
> >
> > Pat
> >
> > -----Original Message-----
> > From: Cam Bazz [mailto:cambazz@gmail.com]
> > Sent: Friday, February 11, 2011 4:55 PM
> > To: user@hive.apache.org
> > Subject: Re: error out of all sudden
> >
> > but is there a ridiculously low default for hdfs space limits? I
> > looked everywhere in the configuration files, but could not find
> > anything that limits the size of hdfs
> >
> > i think i am running on a 150GB hard drive, and the data I am
> > processing is in amounts of couple of hundred of megabytes at max.
> >
> > best regards,
> >
> > -cam
> >
> >
> >
> > On Sat, Feb 12, 2011 at 2:44 AM, Christopher, Pat
> > <pa...@hp.com> wrote:
> >> Is your hdfs hitting its space limits?
> >>
> >> Pat
> >>
> >> -----Original Message-----
> >> From: Cam Bazz [mailto:cambazz@gmail.com]
> >> Sent: Friday, February 11, 2011 4:38 PM
> >> To: user@hive.apache.org
> >> Subject: error out of all sudden
> >>
> >> Hello,
> >>
> >> I set up my one node pseudo distributed system, left with a cronjob,
> >> copying data from a remote server and loading them to hadoop, and
> >> doing some calculations per hour.
> >>
> >> It stopped working today, giving me this error. I deleted everything,
> >> and made it reprocess from beginning, and i still get the same error
> >> same place.
> >>
> >> is there a limit, on how many partitions there can be in a table?
> >>
> >> so, I tried for couple of hours solving the problem, but now my hive
> >> fun is over...
> >>
> >> any ideas as to why this might be happening, or what should i do tring
> >> to debug it.
> >>
> >> best regards,
> >> -c.b.
> >>
> >>
> >> 11/02/12 01:27:47 INFO ql.Driver: Starting command: load data local
> >> inpath '/var/mylog/hourly/log.CAT.2011021119' into table cat_raw
> >> partition(date_hour=2011021119)
> >> Copying data from file:/var/mylog/hourly/log.CAT.2011021119
> >>
> >> 11/02/12 01:27:47 INFO exec.CopyTask: Copying data from
> >> file:/var/mylog/hourly/log.CAT.2011021119 to
> >>
> hdfs://darkstar:9000/tmp/hive-cam/hive_2011-02-12_01-27-47_415_7165217842693560517/10000
> >>
> >> 11/02/12 01:27:47 INFO hdfs.DFSClient: Exception in
> >> createBlockOutputStream java.io.EOFException
> >> 11/02/12 01:27:47 INFO hdfs.DFSClient: Abandoning block
> >> blk_6275225343572661963_1859
> >> 11/02/12 01:27:53 INFO hdfs.DFSClient: Exception in
> >> createBlockOutputStream java.io.EOFException
> >> 11/02/12 01:27:53 INFO hdfs.DFSClient: Abandoning block
> >> blk_2673116090916206836_1859
> >> 11/02/12 01:27:59 INFO hdfs.DFSClient: Exception in
> >> createBlockOutputStream java.io.EOFException
> >> 11/02/12 01:27:59 INFO hdfs.DFSClient: Abandoning block
> >> blk_5414825878079983460_1859
> >> 11/02/12 01:28:05 INFO hdfs.DFSClient: Exception in
> >> createBlockOutputStream java.io.EOFException
> >> 11/02/12 01:28:05 INFO hdfs.DFSClient: Abandoning block
> >> blk_6043862611357349730_1859
> >> 11/02/12 01:28:11 WARN hdfs.DFSClient: DataStreamer Exception:
> >> java.io.IOException: Unable to create new block.
> >>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2845)
> >>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
> >>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
> >>
> >> 11/02/12 01:28:11 WARN hdfs.DFSClient: Error Recovery for block
> >> blk_6043862611357349730_1859 bad datanode[0] nodes == null
> >> 11/02/12 01:28:11 WARN hdfs.DFSClient: Could not get block locations.
> >> Source file
> "/tmp/hive-cam/hive_2011-02-12_01-27-47_415_7165217842693560517/10000/log.CAT.2011021119"
> >> - Aborting...
> >> Failed with exception null
> >> 11/02/12 01:28:11 ERROR exec.CopyTask: Failed with exception null
> >> java.io.EOFException
> >>        at java.io.DataInputStream.readByte(DataInputStream.java:250)
> >>        at
> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
> >>        at
> org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
> >>        at org.apache.hadoop.io.Text.readString(Text.java:400)
> >>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2901)
> >>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826)
> >>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
> >>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
> >>
> >
>

Re: error out of all sudden

Posted by Cam Bazz <ca...@gmail.com>.
yes, i have a lot of small files. this is because i wanted to process
hourly instead of daily.

i will be checking into whether this is the case, i now am re-running
the process, and I see

332 files and directories, 231 blocks = 563 total. Heap Size is 119.88
MB / 910.25 MB (13%)
Configured Capacity	:	140.72 GB
DFS Used	:	6.63 MB
Non DFS Used	:	8.76 GB
DFS Remaining	:	131.95 GB
DFS Used%	:	0 %
DFS Remaining%	:	93.77 %

I do not think this is the case, but I will be monitoring, and will
see in half an hour.

best regards, and thanks a bunch.

-cam



On Sat, Feb 12, 2011 at 3:00 AM, Christopher, Pat
<pa...@hp.com> wrote:
> If you're running with the defaults I think its around 20gb.  If you're processing a couple hundred MBs you could easily hit this limit between desired outputs and any intermediate files created.  HDFS allocates the available space in blocks so if you have a lot of small files, you'll run out of blocks before you run out space.  This is one reason why HDFS/hadoop is 'bad' for dealing with lots of small files.
>
> You can check here:  localhost:50070 that's the web page for your hdfs namenode.  It has status information on your hdfs including size.
>
> Pat
>
> -----Original Message-----
> From: Cam Bazz [mailto:cambazz@gmail.com]
> Sent: Friday, February 11, 2011 4:55 PM
> To: user@hive.apache.org
> Subject: Re: error out of all sudden
>
> but is there a ridiculously low default for hdfs space limits? I
> looked everywhere in the configuration files, but could not find
> anything that limits the size of hdfs
>
> i think i am running on a 150GB hard drive, and the data I am
> processing is in amounts of couple of hundred of megabytes at max.
>
> best regards,
>
> -cam
>
>
>
> On Sat, Feb 12, 2011 at 2:44 AM, Christopher, Pat
> <pa...@hp.com> wrote:
>> Is your hdfs hitting its space limits?
>>
>> Pat
>>
>> -----Original Message-----
>> From: Cam Bazz [mailto:cambazz@gmail.com]
>> Sent: Friday, February 11, 2011 4:38 PM
>> To: user@hive.apache.org
>> Subject: error out of all sudden
>>
>> Hello,
>>
>> I set up my one node pseudo distributed system, left with a cronjob,
>> copying data from a remote server and loading them to hadoop, and
>> doing some calculations per hour.
>>
>> It stopped working today, giving me this error. I deleted everything,
>> and made it reprocess from beginning, and i still get the same error
>> same place.
>>
>> is there a limit, on how many partitions there can be in a table?
>>
>> so, I tried for couple of hours solving the problem, but now my hive
>> fun is over...
>>
>> any ideas as to why this might be happening, or what should i do tring
>> to debug it.
>>
>> best regards,
>> -c.b.
>>
>>
>> 11/02/12 01:27:47 INFO ql.Driver: Starting command: load data local
>> inpath '/var/mylog/hourly/log.CAT.2011021119' into table cat_raw
>> partition(date_hour=2011021119)
>> Copying data from file:/var/mylog/hourly/log.CAT.2011021119
>>
>> 11/02/12 01:27:47 INFO exec.CopyTask: Copying data from
>> file:/var/mylog/hourly/log.CAT.2011021119 to
>> hdfs://darkstar:9000/tmp/hive-cam/hive_2011-02-12_01-27-47_415_7165217842693560517/10000
>>
>> 11/02/12 01:27:47 INFO hdfs.DFSClient: Exception in
>> createBlockOutputStream java.io.EOFException
>> 11/02/12 01:27:47 INFO hdfs.DFSClient: Abandoning block
>> blk_6275225343572661963_1859
>> 11/02/12 01:27:53 INFO hdfs.DFSClient: Exception in
>> createBlockOutputStream java.io.EOFException
>> 11/02/12 01:27:53 INFO hdfs.DFSClient: Abandoning block
>> blk_2673116090916206836_1859
>> 11/02/12 01:27:59 INFO hdfs.DFSClient: Exception in
>> createBlockOutputStream java.io.EOFException
>> 11/02/12 01:27:59 INFO hdfs.DFSClient: Abandoning block
>> blk_5414825878079983460_1859
>> 11/02/12 01:28:05 INFO hdfs.DFSClient: Exception in
>> createBlockOutputStream java.io.EOFException
>> 11/02/12 01:28:05 INFO hdfs.DFSClient: Abandoning block
>> blk_6043862611357349730_1859
>> 11/02/12 01:28:11 WARN hdfs.DFSClient: DataStreamer Exception:
>> java.io.IOException: Unable to create new block.
>>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2845)
>>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
>>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
>>
>> 11/02/12 01:28:11 WARN hdfs.DFSClient: Error Recovery for block
>> blk_6043862611357349730_1859 bad datanode[0] nodes == null
>> 11/02/12 01:28:11 WARN hdfs.DFSClient: Could not get block locations.
>> Source file "/tmp/hive-cam/hive_2011-02-12_01-27-47_415_7165217842693560517/10000/log.CAT.2011021119"
>> - Aborting...
>> Failed with exception null
>> 11/02/12 01:28:11 ERROR exec.CopyTask: Failed with exception null
>> java.io.EOFException
>>        at java.io.DataInputStream.readByte(DataInputStream.java:250)
>>        at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
>>        at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
>>        at org.apache.hadoop.io.Text.readString(Text.java:400)
>>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2901)
>>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826)
>>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
>>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
>>
>

RE: error out of all sudden

Posted by "Christopher, Pat" <pa...@hp.com>.
If you're running with the defaults I think its around 20gb.  If you're processing a couple hundred MBs you could easily hit this limit between desired outputs and any intermediate files created.  HDFS allocates the available space in blocks so if you have a lot of small files, you'll run out of blocks before you run out space.  This is one reason why HDFS/hadoop is 'bad' for dealing with lots of small files.

You can check here:  localhost:50070 that's the web page for your hdfs namenode.  It has status information on your hdfs including size.

Pat

-----Original Message-----
From: Cam Bazz [mailto:cambazz@gmail.com] 
Sent: Friday, February 11, 2011 4:55 PM
To: user@hive.apache.org
Subject: Re: error out of all sudden

but is there a ridiculously low default for hdfs space limits? I
looked everywhere in the configuration files, but could not find
anything that limits the size of hdfs

i think i am running on a 150GB hard drive, and the data I am
processing is in amounts of couple of hundred of megabytes at max.

best regards,

-cam



On Sat, Feb 12, 2011 at 2:44 AM, Christopher, Pat
<pa...@hp.com> wrote:
> Is your hdfs hitting its space limits?
>
> Pat
>
> -----Original Message-----
> From: Cam Bazz [mailto:cambazz@gmail.com]
> Sent: Friday, February 11, 2011 4:38 PM
> To: user@hive.apache.org
> Subject: error out of all sudden
>
> Hello,
>
> I set up my one node pseudo distributed system, left with a cronjob,
> copying data from a remote server and loading them to hadoop, and
> doing some calculations per hour.
>
> It stopped working today, giving me this error. I deleted everything,
> and made it reprocess from beginning, and i still get the same error
> same place.
>
> is there a limit, on how many partitions there can be in a table?
>
> so, I tried for couple of hours solving the problem, but now my hive
> fun is over...
>
> any ideas as to why this might be happening, or what should i do tring
> to debug it.
>
> best regards,
> -c.b.
>
>
> 11/02/12 01:27:47 INFO ql.Driver: Starting command: load data local
> inpath '/var/mylog/hourly/log.CAT.2011021119' into table cat_raw
> partition(date_hour=2011021119)
> Copying data from file:/var/mylog/hourly/log.CAT.2011021119
>
> 11/02/12 01:27:47 INFO exec.CopyTask: Copying data from
> file:/var/mylog/hourly/log.CAT.2011021119 to
> hdfs://darkstar:9000/tmp/hive-cam/hive_2011-02-12_01-27-47_415_7165217842693560517/10000
>
> 11/02/12 01:27:47 INFO hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
> 11/02/12 01:27:47 INFO hdfs.DFSClient: Abandoning block
> blk_6275225343572661963_1859
> 11/02/12 01:27:53 INFO hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
> 11/02/12 01:27:53 INFO hdfs.DFSClient: Abandoning block
> blk_2673116090916206836_1859
> 11/02/12 01:27:59 INFO hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
> 11/02/12 01:27:59 INFO hdfs.DFSClient: Abandoning block
> blk_5414825878079983460_1859
> 11/02/12 01:28:05 INFO hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
> 11/02/12 01:28:05 INFO hdfs.DFSClient: Abandoning block
> blk_6043862611357349730_1859
> 11/02/12 01:28:11 WARN hdfs.DFSClient: DataStreamer Exception:
> java.io.IOException: Unable to create new block.
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2845)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
>
> 11/02/12 01:28:11 WARN hdfs.DFSClient: Error Recovery for block
> blk_6043862611357349730_1859 bad datanode[0] nodes == null
> 11/02/12 01:28:11 WARN hdfs.DFSClient: Could not get block locations.
> Source file "/tmp/hive-cam/hive_2011-02-12_01-27-47_415_7165217842693560517/10000/log.CAT.2011021119"
> - Aborting...
> Failed with exception null
> 11/02/12 01:28:11 ERROR exec.CopyTask: Failed with exception null
> java.io.EOFException
>        at java.io.DataInputStream.readByte(DataInputStream.java:250)
>        at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
>        at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
>        at org.apache.hadoop.io.Text.readString(Text.java:400)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2901)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
>

Re: error out of all sudden

Posted by Cam Bazz <ca...@gmail.com>.
but is there a ridiculously low default for hdfs space limits? I
looked everywhere in the configuration files, but could not find
anything that limits the size of hdfs

i think i am running on a 150GB hard drive, and the data I am
processing is in amounts of couple of hundred of megabytes at max.

best regards,

-cam



On Sat, Feb 12, 2011 at 2:44 AM, Christopher, Pat
<pa...@hp.com> wrote:
> Is your hdfs hitting its space limits?
>
> Pat
>
> -----Original Message-----
> From: Cam Bazz [mailto:cambazz@gmail.com]
> Sent: Friday, February 11, 2011 4:38 PM
> To: user@hive.apache.org
> Subject: error out of all sudden
>
> Hello,
>
> I set up my one node pseudo distributed system, left with a cronjob,
> copying data from a remote server and loading them to hadoop, and
> doing some calculations per hour.
>
> It stopped working today, giving me this error. I deleted everything,
> and made it reprocess from beginning, and i still get the same error
> same place.
>
> is there a limit, on how many partitions there can be in a table?
>
> so, I tried for couple of hours solving the problem, but now my hive
> fun is over...
>
> any ideas as to why this might be happening, or what should i do tring
> to debug it.
>
> best regards,
> -c.b.
>
>
> 11/02/12 01:27:47 INFO ql.Driver: Starting command: load data local
> inpath '/var/mylog/hourly/log.CAT.2011021119' into table cat_raw
> partition(date_hour=2011021119)
> Copying data from file:/var/mylog/hourly/log.CAT.2011021119
>
> 11/02/12 01:27:47 INFO exec.CopyTask: Copying data from
> file:/var/mylog/hourly/log.CAT.2011021119 to
> hdfs://darkstar:9000/tmp/hive-cam/hive_2011-02-12_01-27-47_415_7165217842693560517/10000
>
> 11/02/12 01:27:47 INFO hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
> 11/02/12 01:27:47 INFO hdfs.DFSClient: Abandoning block
> blk_6275225343572661963_1859
> 11/02/12 01:27:53 INFO hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
> 11/02/12 01:27:53 INFO hdfs.DFSClient: Abandoning block
> blk_2673116090916206836_1859
> 11/02/12 01:27:59 INFO hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
> 11/02/12 01:27:59 INFO hdfs.DFSClient: Abandoning block
> blk_5414825878079983460_1859
> 11/02/12 01:28:05 INFO hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
> 11/02/12 01:28:05 INFO hdfs.DFSClient: Abandoning block
> blk_6043862611357349730_1859
> 11/02/12 01:28:11 WARN hdfs.DFSClient: DataStreamer Exception:
> java.io.IOException: Unable to create new block.
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2845)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
>
> 11/02/12 01:28:11 WARN hdfs.DFSClient: Error Recovery for block
> blk_6043862611357349730_1859 bad datanode[0] nodes == null
> 11/02/12 01:28:11 WARN hdfs.DFSClient: Could not get block locations.
> Source file "/tmp/hive-cam/hive_2011-02-12_01-27-47_415_7165217842693560517/10000/log.CAT.2011021119"
> - Aborting...
> Failed with exception null
> 11/02/12 01:28:11 ERROR exec.CopyTask: Failed with exception null
> java.io.EOFException
>        at java.io.DataInputStream.readByte(DataInputStream.java:250)
>        at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
>        at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
>        at org.apache.hadoop.io.Text.readString(Text.java:400)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2901)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
>

RE: error out of all sudden

Posted by "Christopher, Pat" <pa...@hp.com>.
Is your hdfs hitting its space limits?

Pat

-----Original Message-----
From: Cam Bazz [mailto:cambazz@gmail.com] 
Sent: Friday, February 11, 2011 4:38 PM
To: user@hive.apache.org
Subject: error out of all sudden

Hello,

I set up my one node pseudo distributed system, left with a cronjob,
copying data from a remote server and loading them to hadoop, and
doing some calculations per hour.

It stopped working today, giving me this error. I deleted everything,
and made it reprocess from beginning, and i still get the same error
same place.

is there a limit, on how many partitions there can be in a table?

so, I tried for couple of hours solving the problem, but now my hive
fun is over...

any ideas as to why this might be happening, or what should i do tring
to debug it.

best regards,
-c.b.


11/02/12 01:27:47 INFO ql.Driver: Starting command: load data local
inpath '/var/mylog/hourly/log.CAT.2011021119' into table cat_raw
partition(date_hour=2011021119)
Copying data from file:/var/mylog/hourly/log.CAT.2011021119

11/02/12 01:27:47 INFO exec.CopyTask: Copying data from
file:/var/mylog/hourly/log.CAT.2011021119 to
hdfs://darkstar:9000/tmp/hive-cam/hive_2011-02-12_01-27-47_415_7165217842693560517/10000

11/02/12 01:27:47 INFO hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException
11/02/12 01:27:47 INFO hdfs.DFSClient: Abandoning block
blk_6275225343572661963_1859
11/02/12 01:27:53 INFO hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException
11/02/12 01:27:53 INFO hdfs.DFSClient: Abandoning block
blk_2673116090916206836_1859
11/02/12 01:27:59 INFO hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException
11/02/12 01:27:59 INFO hdfs.DFSClient: Abandoning block
blk_5414825878079983460_1859
11/02/12 01:28:05 INFO hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException
11/02/12 01:28:05 INFO hdfs.DFSClient: Abandoning block
blk_6043862611357349730_1859
11/02/12 01:28:11 WARN hdfs.DFSClient: DataStreamer Exception:
java.io.IOException: Unable to create new block.
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2845)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)

11/02/12 01:28:11 WARN hdfs.DFSClient: Error Recovery for block
blk_6043862611357349730_1859 bad datanode[0] nodes == null
11/02/12 01:28:11 WARN hdfs.DFSClient: Could not get block locations.
Source file "/tmp/hive-cam/hive_2011-02-12_01-27-47_415_7165217842693560517/10000/log.CAT.2011021119"
- Aborting...
Failed with exception null
11/02/12 01:28:11 ERROR exec.CopyTask: Failed with exception null
java.io.EOFException
        at java.io.DataInputStream.readByte(DataInputStream.java:250)
        at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
        at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
        at org.apache.hadoop.io.Text.readString(Text.java:400)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2901)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)