You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Craig Macdonald (JIRA)" <ji...@apache.org> on 2008/02/04 19:29:08 UTC
[jira] Updated: (PIG-89) Too many spills to files causes
ArrayIndexOutOfBoundsException if new temp file cant be created
[ https://issues.apache.org/jira/browse/PIG-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Craig Macdonald updated PIG-89:
-------------------------------
Description:
Hello,
I am experimenting, trying to perform a DISTINCT on a medium sized set of URLs - about 3million (same set as I discussed previously - Utkarsh has a copy), this time in local execution mode.
Pig script:
{{
A = LOAD 'all_13122007.txt';
B = DISTINCT A;
store B into 'bla;
}}
Bring these errors (two lines swapped in DefaultDatabag) to find real error.
{{
2008-02-04 18:09:44,756 [Low Memory Detector] INFO org.apache.pig - low memory handler called init = 29491200(28800K) used = 269834064(263509K) committed = 307036160(299840K) max = 471662592(460608K)
2008-02-04 18:09:45,355 [Low Memory Detector] ERROR org.apache.pig - Unable to spill contents to disk
java.io.IOException: Too many open files
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.checkAndCreate(File.java:1704)
at java.io.File.createTempFile(File.java:1793)
at java.io.File.createTempFile(File.java:1830)
at org.apache.pig.data.DataBag.getSpillFile(DataBag.java:367)
at org.apache.pig.data.DefaultDataBag.spill(DefaultDataBag.java:69)
at org.apache.pig.impl.util.SpillableMemoryManager.handleNotification(SpillableMemoryManager.java:123)
at sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:138)
at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171)
at sun.management.MemoryPoolImpl$CollectionSensor.triggerAction(MemoryPoolImpl.java:300)
at sun.management.Sensor.trigger(Sensor.java:120)
java.lang.ArrayIndexOutOfBoundsException: -1
at java.util.ArrayList.remove(ArrayList.java:390)
at org.apache.pig.data.DefaultDataBag.spill(DefaultDataBag.java:84)
at org.apache.pig.impl.util.SpillableMemoryManager.handleNotification(SpillableMemoryManager.java:123)
at sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:138)
at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171)
at sun.management.MemoryPoolImpl$CollectionSensor.triggerAction(MemoryPoolImpl.java:300)
at sun.management.Sensor.trigger(Sensor.java:120)
Exception in thread "Low Memory Detector" java.lang.InternalError: Error in invoking listener
at sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:141)
at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171)
at sun.management.MemoryPoolImpl$CollectionSensor.triggerAction(MemoryPoolImpl.java:300)
at sun.management.Sensor.trigger(Sensor.java:120)
}}
There are a two sub-issues here:
1. Pig spills too much using a default JVM (64MB) size - expected?
Perhaps pig.pl should set a default JVM size of more than 64MB?
2. the line DefaultDataBag.java:84
{{{
mSpillFiles.remove(mSpillFiles.size() - 1);
}}}
line should check that mSpillFiles.size() > 0, because if File.createTempFile( ) in Databag.getSpillFile() fails, the mSpillFiles will not yet have been updated. My preference would be to split try{ } catch (IOException ioe) { } within DefaultDatabag.spill() into two exception handlers - one for getSpillFile() errors, and one for actual writing errors (when we know mSpillFiles has been added to).
If this latter point isnt coherent, I can create patch.
Ta muchly.
C
was:
Hello,
I am experimenting, trying to perform a DISTINCT on a medium sized set of URLs - about 3million (same set as I discussed previously - Utkarsh has a copy), this time in local execution mode.
Pig script:
{{{
A = LOAD 'all_13122007.txt';
B = DISTINCT A;
store B into 'bla;
}}}
Bring these errors (two lines swapped in DefaultDatabag) to find real error.
{{{
2008-02-04 18:09:44,756 [Low Memory Detector] INFO org.apache.pig - low memory handler called init = 29491200(28800K) used = 269834064(263509K) committed = 307036160(299840K) max = 471662592(460608K)
2008-02-04 18:09:45,355 [Low Memory Detector] ERROR org.apache.pig - Unable to spill contents to disk
java.io.IOException: Too many open files
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.checkAndCreate(File.java:1704)
at java.io.File.createTempFile(File.java:1793)
at java.io.File.createTempFile(File.java:1830)
at org.apache.pig.data.DataBag.getSpillFile(DataBag.java:367)
at org.apache.pig.data.DefaultDataBag.spill(DefaultDataBag.java:69)
at org.apache.pig.impl.util.SpillableMemoryManager.handleNotification(SpillableMemoryManager.java:123)
at sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:138)
at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171)
at sun.management.MemoryPoolImpl$CollectionSensor.triggerAction(MemoryPoolImpl.java:300)
at sun.management.Sensor.trigger(Sensor.java:120)
java.lang.ArrayIndexOutOfBoundsException: -1
at java.util.ArrayList.remove(ArrayList.java:390)
at org.apache.pig.data.DefaultDataBag.spill(DefaultDataBag.java:84)
at org.apache.pig.impl.util.SpillableMemoryManager.handleNotification(SpillableMemoryManager.java:123)
at sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:138)
at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171)
at sun.management.MemoryPoolImpl$CollectionSensor.triggerAction(MemoryPoolImpl.java:300)
at sun.management.Sensor.trigger(Sensor.java:120)
Exception in thread "Low Memory Detector" java.lang.InternalError: Error in invoking listener
at sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:141)
at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171)
at sun.management.MemoryPoolImpl$CollectionSensor.triggerAction(MemoryPoolImpl.java:300)
at sun.management.Sensor.trigger(Sensor.java:120)
}}}
There are a two sub-issues here:
1. Pig spills too much using a default JVM (64MB) size - expected?
Perhaps pig.pl should set a default JVM size of more than 64MB?
2. the line DefaultDataBag.java:84
{{{
mSpillFiles.remove(mSpillFiles.size() - 1);
}}}
line should check that mSpillFiles.size() > 0, because if File.createTempFile( ) in Databag.getSpillFile() fails, the mSpillFiles will not yet have been updated. My preference would be to split try{ } catch (IOException ioe) { } within DefaultDatabag.spill() into two exception handlers - one for getSpillFile() errors, and one for actual writing errors (when we know mSpillFiles has been added to).
If this latter point isnt coherent, I can create patch.
Ta muchly.
C
formatting
> Too many spills to files causes ArrayIndexOutOfBoundsException if new temp file cant be created
> -----------------------------------------------------------------------------------------------
>
> Key: PIG-89
> URL: https://issues.apache.org/jira/browse/PIG-89
> Project: Pig
> Issue Type: Bug
> Components: data
> Environment: Linux, Local execution Mode, JDK 1.6
> Reporter: Craig Macdonald
>
> Hello,
> I am experimenting, trying to perform a DISTINCT on a medium sized set of URLs - about 3million (same set as I discussed previously - Utkarsh has a copy), this time in local execution mode.
> Pig script:
> {{
> A = LOAD 'all_13122007.txt';
> B = DISTINCT A;
> store B into 'bla;
> }}
> Bring these errors (two lines swapped in DefaultDatabag) to find real error.
> {{
> 2008-02-04 18:09:44,756 [Low Memory Detector] INFO org.apache.pig - low memory handler called init = 29491200(28800K) used = 269834064(263509K) committed = 307036160(299840K) max = 471662592(460608K)
> 2008-02-04 18:09:45,355 [Low Memory Detector] ERROR org.apache.pig - Unable to spill contents to disk
> java.io.IOException: Too many open files
> at java.io.UnixFileSystem.createFileExclusively(Native Method)
> at java.io.File.checkAndCreate(File.java:1704)
> at java.io.File.createTempFile(File.java:1793)
> at java.io.File.createTempFile(File.java:1830)
> at org.apache.pig.data.DataBag.getSpillFile(DataBag.java:367)
> at org.apache.pig.data.DefaultDataBag.spill(DefaultDataBag.java:69)
> at org.apache.pig.impl.util.SpillableMemoryManager.handleNotification(SpillableMemoryManager.java:123)
> at sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:138)
> at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171)
> at sun.management.MemoryPoolImpl$CollectionSensor.triggerAction(MemoryPoolImpl.java:300)
> at sun.management.Sensor.trigger(Sensor.java:120)
> java.lang.ArrayIndexOutOfBoundsException: -1
> at java.util.ArrayList.remove(ArrayList.java:390)
> at org.apache.pig.data.DefaultDataBag.spill(DefaultDataBag.java:84)
> at org.apache.pig.impl.util.SpillableMemoryManager.handleNotification(SpillableMemoryManager.java:123)
> at sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:138)
> at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171)
> at sun.management.MemoryPoolImpl$CollectionSensor.triggerAction(MemoryPoolImpl.java:300)
> at sun.management.Sensor.trigger(Sensor.java:120)
> Exception in thread "Low Memory Detector" java.lang.InternalError: Error in invoking listener
> at sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:141)
> at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171)
> at sun.management.MemoryPoolImpl$CollectionSensor.triggerAction(MemoryPoolImpl.java:300)
> at sun.management.Sensor.trigger(Sensor.java:120)
> }}
> There are a two sub-issues here:
> 1. Pig spills too much using a default JVM (64MB) size - expected?
> Perhaps pig.pl should set a default JVM size of more than 64MB?
> 2. the line DefaultDataBag.java:84
> {{{
> mSpillFiles.remove(mSpillFiles.size() - 1);
> }}}
> line should check that mSpillFiles.size() > 0, because if File.createTempFile( ) in Databag.getSpillFile() fails, the mSpillFiles will not yet have been updated. My preference would be to split try{ } catch (IOException ioe) { } within DefaultDatabag.spill() into two exception handlers - one for getSpillFile() errors, and one for actual writing errors (when we know mSpillFiles has been added to).
> If this latter point isnt coherent, I can create patch.
> Ta muchly.
> C
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
Re: Parameter Substitution in Pig
Posted by Eric Baldeschwieler <er...@yahoo-inc.com>.
This approach seems fairly brittle. For example whitespace in a
parameter can have unexpected results.
Given the use of a distinct preprocessor, are there any existing ones
that make sense?
Does the pig language support literals with whitespace, carriage-
returns and so on? If so, how does one define a parameter in a file
with such characters in them?
How does this approach compare to solutions used in SQL?
In Hadoop, the discussion would be in a jira. Does moving the
discussion to a jira make sense?
On Feb 4, 2008, at 10:52 AM, Olga Natkovich wrote:
>
> Hi,
>
> I have posted a proposal for parameter substitution in pig. Please,
> review and comment:
>
> http://wiki.apache.org/pig/ParameterSubstitution
>
> Olga
Parameter Substitution in Pig
Posted by Olga Natkovich <ol...@yahoo-inc.com>.
Hi,
I have posted a proposal for parameter substitution in pig. Please,
review and comment:
http://wiki.apache.org/pig/ParameterSubstitution
Olga
Re: Pig Streaming Specification
Posted by Xu Zhang <xu...@yahoo-inc.com>.
2 possible typos:
- "<comparison spec>" might need to be "<computation spec>"
- "date is grouped and sorted" might need to be "data is grouped and sorted"
Best regards,
Xu
> From: Olga Natkovich <ol...@yahoo-inc.com>
> Reply-To: <pi...@incubator.apache.org>
> Date: Mon, 4 Feb 2008 10:51:16 -0800
> To: <pi...@incubator.apache.org>
> Conversation: Pig Streaming Specification
> Subject: Pig Streaming Specification
>
> Hi,
>
> I have posted a proposal for functional spec for Streaming in Pig.
> Please, review and comment:
>
> http://wiki.apache.org/pig/PigStreamingFunctionalSpec
>
> Olga
>
Pig Streaming Specification
Posted by Olga Natkovich <ol...@yahoo-inc.com>.
Hi,
I have posted a proposal for functional spec for Streaming in Pig.
Please, review and comment:
http://wiki.apache.org/pig/PigStreamingFunctionalSpec
Olga