You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "lohit vijayarenu (JIRA)" <ji...@apache.org> on 2007/10/18 00:25:50 UTC

[jira] Created: (HADOOP-2071) StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14

StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
-------------------------------------------------------------------------------------

                 Key: HADOOP-2071
                 URL: https://issues.apache.org/jira/browse/HADOOP-2071
             Project: Hadoop
          Issue Type: Bug
          Components: contrib/streaming
    Affects Versions: 0.14.3
            Reporter: lohit vijayarenu


In hadoop 0.14, using -inputreader StreamXmlRecordReader  for streaming jobs throw 
java.io.IOException: Mark/reset exception in hadoop 0.14
This looks to be related to (https://issues.apache.org/jira/browse/HADOOP-2067).

<stack trace>
Caused by: java.io.IOException: Mark/reset not supported
	at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353)
	at java.io.FilterInputStream.reset(FilterInputStream.java:200)
	at
org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX
mlRecordReader.java:289)
	at
org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream
XmlRecordReader.java:118)
	at
org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str
eamXmlRecordReader.java:111)
	at
org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader
.java:73)
	at
org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav
a:63)

</stack trace>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2071) StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14

Posted by "Nigel Daley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nigel Daley updated HADOOP-2071:
--------------------------------

    Fix Version/s:     (was: 0.15.0)
                   0.16.0

Milind requested I move this to 0.16.

> StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2071
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2071
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.14.3
>            Reporter: lohit vijayarenu
>            Assignee: lohit vijayarenu
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-2071-1.patch, HADOOP-2071-2.patch
>
>
> In hadoop 0.14, using -inputreader StreamXmlRecordReader  for streaming jobs throw 
> java.io.IOException: Mark/reset exception in hadoop 0.14
> This looks to be related to (https://issues.apache.org/jira/browse/HADOOP-2067).
> <stack trace>
> Caused by: java.io.IOException: Mark/reset not supported
> 	at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353)
> 	at java.io.FilterInputStream.reset(FilterInputStream.java:200)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX
> mlRecordReader.java:289)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream
> XmlRecordReader.java:118)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str
> eamXmlRecordReader.java:111)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader
> .java:73)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav
> a:63)
> </stack trace>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2071) StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14

Posted by "lohit vijayarenu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

lohit vijayarenu updated HADOOP-2071:
-------------------------------------

    Attachment: HADOOP-2071-2.patch

With inputs from Raghu and Milind, here is an updated patch. This wraps FSDataInputStream around BufferedInputStream and eliminates seek(). Patch also includes a simple test case for StreamXmlRecordReader. 

> StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2071
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2071
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.14.3
>            Reporter: lohit vijayarenu
>            Assignee: lohit vijayarenu
>         Attachments: HADOOP-2071-1.patch, HADOOP-2071-2.patch
>
>
> In hadoop 0.14, using -inputreader StreamXmlRecordReader  for streaming jobs throw 
> java.io.IOException: Mark/reset exception in hadoop 0.14
> This looks to be related to (https://issues.apache.org/jira/browse/HADOOP-2067).
> <stack trace>
> Caused by: java.io.IOException: Mark/reset not supported
> 	at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353)
> 	at java.io.FilterInputStream.reset(FilterInputStream.java:200)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX
> mlRecordReader.java:289)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream
> XmlRecordReader.java:118)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str
> eamXmlRecordReader.java:111)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader
> .java:73)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav
> a:63)
> </stack trace>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2071) StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12535778 ] 

Raghu Angadi commented on HADOOP-2071:
--------------------------------------


Mark/reset are supported anymore. If streaming must use mark/reset, it should use a BufferedInputStream over DFSInputStream.

> StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2071
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2071
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.14.3
>            Reporter: lohit vijayarenu
>
> In hadoop 0.14, using -inputreader StreamXmlRecordReader  for streaming jobs throw 
> java.io.IOException: Mark/reset exception in hadoop 0.14
> This looks to be related to (https://issues.apache.org/jira/browse/HADOOP-2067).
> <stack trace>
> Caused by: java.io.IOException: Mark/reset not supported
> 	at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353)
> 	at java.io.FilterInputStream.reset(FilterInputStream.java:200)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX
> mlRecordReader.java:289)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream
> XmlRecordReader.java:118)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str
> eamXmlRecordReader.java:111)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader
> .java:73)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav
> a:63)
> </stack trace>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2071) StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536013 ] 

Raghu Angadi commented on HADOOP-2071:
--------------------------------------

After a little bit more discussion it looks like using BufferedInputStream can get rid of problem with seek-back as well. Because we are always seeking with-in what we have recently read. So we would replace seek() with {{ reset(); skip(); }}.


> StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2071
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2071
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.14.3
>            Reporter: lohit vijayarenu
>            Assignee: lohit vijayarenu
>         Attachments: HADOOP-2071-1.patch
>
>
> In hadoop 0.14, using -inputreader StreamXmlRecordReader  for streaming jobs throw 
> java.io.IOException: Mark/reset exception in hadoop 0.14
> This looks to be related to (https://issues.apache.org/jira/browse/HADOOP-2067).
> <stack trace>
> Caused by: java.io.IOException: Mark/reset not supported
> 	at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353)
> 	at java.io.FilterInputStream.reset(FilterInputStream.java:200)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX
> mlRecordReader.java:289)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream
> XmlRecordReader.java:118)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str
> eamXmlRecordReader.java:111)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader
> .java:73)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav
> a:63)
> </stack trace>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2071) StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14

Posted by "lohit vijayarenu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

lohit vijayarenu updated HADOOP-2071:
-------------------------------------

    Status: Patch Available  (was: Open)

Making this PA

> StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2071
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2071
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.14.3
>            Reporter: lohit vijayarenu
>            Assignee: lohit vijayarenu
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-2071-1.patch, HADOOP-2071-2.patch, HADOOP-2071-3.patch, HADOOP-2071-4.patch
>
>
> In hadoop 0.14, using -inputreader StreamXmlRecordReader  for streaming jobs throw 
> java.io.IOException: Mark/reset exception in hadoop 0.14
> This looks to be related to (https://issues.apache.org/jira/browse/HADOOP-2067).
> <stack trace>
> Caused by: java.io.IOException: Mark/reset not supported
> 	at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353)
> 	at java.io.FilterInputStream.reset(FilterInputStream.java:200)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX
> mlRecordReader.java:289)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream
> XmlRecordReader.java:118)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str
> eamXmlRecordReader.java:111)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader
> .java:73)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav
> a:63)
> </stack trace>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2071) StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14

Posted by "lohit vijayarenu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

lohit vijayarenu updated HADOOP-2071:
-------------------------------------

    Attachment: HADOOP-2071-3.patch

BufferedInputStream does not provide a way to get the current position in the stream and updating the encapsulated FSDataInputStream again is like seek back. So I have a the position stored in pos_ and update it accordingly. Attaching a new patch with this change and testcase. Please could anyone take a look.
Thanks!

> StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2071
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2071
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.14.3
>            Reporter: lohit vijayarenu
>            Assignee: lohit vijayarenu
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-2071-1.patch, HADOOP-2071-2.patch, HADOOP-2071-3.patch
>
>
> In hadoop 0.14, using -inputreader StreamXmlRecordReader  for streaming jobs throw 
> java.io.IOException: Mark/reset exception in hadoop 0.14
> This looks to be related to (https://issues.apache.org/jira/browse/HADOOP-2067).
> <stack trace>
> Caused by: java.io.IOException: Mark/reset not supported
> 	at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353)
> 	at java.io.FilterInputStream.reset(FilterInputStream.java:200)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX
> mlRecordReader.java:289)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream
> XmlRecordReader.java:118)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str
> eamXmlRecordReader.java:111)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader
> .java:73)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav
> a:63)
> </stack trace>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2071) StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537762 ] 

Raghu Angadi commented on HADOOP-2071:
--------------------------------------

Regd: {code}
-    long pos = in_.getPos();
     numNext++;
-    if (pos >= end_) {
+    if (bin_.available() == 0) {
{code}
I am not sure if the above change is correct. Lohit, can you confirm if this? Not sure if end_ is set to EOF or not, and {{available() == 0}} may not indicate EOF.

> StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2071
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2071
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.14.3
>            Reporter: lohit vijayarenu
>            Assignee: lohit vijayarenu
>             Fix For: 0.15.0
>
>         Attachments: HADOOP-2071-1.patch, HADOOP-2071-2.patch
>
>
> In hadoop 0.14, using -inputreader StreamXmlRecordReader  for streaming jobs throw 
> java.io.IOException: Mark/reset exception in hadoop 0.14
> This looks to be related to (https://issues.apache.org/jira/browse/HADOOP-2067).
> <stack trace>
> Caused by: java.io.IOException: Mark/reset not supported
> 	at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353)
> 	at java.io.FilterInputStream.reset(FilterInputStream.java:200)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX
> mlRecordReader.java:289)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream
> XmlRecordReader.java:118)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str
> eamXmlRecordReader.java:111)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader
> .java:73)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav
> a:63)
> </stack trace>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2071) StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14

Posted by "lohit vijayarenu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

lohit vijayarenu updated HADOOP-2071:
-------------------------------------

    Attachment: HADOOP-2071-4.patch

Getting rid of an extra blank line in the patch.

> StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2071
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2071
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.14.3
>            Reporter: lohit vijayarenu
>            Assignee: lohit vijayarenu
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-2071-1.patch, HADOOP-2071-2.patch, HADOOP-2071-3.patch, HADOOP-2071-4.patch
>
>
> In hadoop 0.14, using -inputreader StreamXmlRecordReader  for streaming jobs throw 
> java.io.IOException: Mark/reset exception in hadoop 0.14
> This looks to be related to (https://issues.apache.org/jira/browse/HADOOP-2067).
> <stack trace>
> Caused by: java.io.IOException: Mark/reset not supported
> 	at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353)
> 	at java.io.FilterInputStream.reset(FilterInputStream.java:200)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX
> mlRecordReader.java:289)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream
> XmlRecordReader.java:118)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str
> eamXmlRecordReader.java:111)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader
> .java:73)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav
> a:63)
> </stack trace>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2071) StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14

Posted by "lohit vijayarenu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

lohit vijayarenu updated HADOOP-2071:
-------------------------------------

    Attachment: HADOOP-2071-1.patch

Attached is a patch, which eliminates mark/reset. 
At one place seek() was called even after reset() which made it redundant.
Please could anyone review this.
Thanks

> StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2071
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2071
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.14.3
>            Reporter: lohit vijayarenu
>         Attachments: HADOOP-2071-1.patch
>
>
> In hadoop 0.14, using -inputreader StreamXmlRecordReader  for streaming jobs throw 
> java.io.IOException: Mark/reset exception in hadoop 0.14
> This looks to be related to (https://issues.apache.org/jira/browse/HADOOP-2067).
> <stack trace>
> Caused by: java.io.IOException: Mark/reset not supported
> 	at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353)
> 	at java.io.FilterInputStream.reset(FilterInputStream.java:200)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX
> mlRecordReader.java:289)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream
> XmlRecordReader.java:118)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str
> eamXmlRecordReader.java:111)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader
> .java:73)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav
> a:63)
> </stack trace>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2071) StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539145 ] 

Raghu Angadi commented on HADOOP-2071:
--------------------------------------

+1.  Also Lohit verified the patch with large files and original user app that triggered the bug.

> StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2071
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2071
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.14.3
>            Reporter: lohit vijayarenu
>            Assignee: lohit vijayarenu
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-2071-1.patch, HADOOP-2071-2.patch, HADOOP-2071-3.patch, HADOOP-2071-4.patch
>
>
> In hadoop 0.14, using -inputreader StreamXmlRecordReader  for streaming jobs throw 
> java.io.IOException: Mark/reset exception in hadoop 0.14
> This looks to be related to (https://issues.apache.org/jira/browse/HADOOP-2067).
> <stack trace>
> Caused by: java.io.IOException: Mark/reset not supported
> 	at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353)
> 	at java.io.FilterInputStream.reset(FilterInputStream.java:200)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX
> mlRecordReader.java:289)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream
> XmlRecordReader.java:118)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str
> eamXmlRecordReader.java:111)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader
> .java:73)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav
> a:63)
> </stack trace>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2071) StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Milind Bhandarkar updated HADOOP-2071:
--------------------------------------

    Fix Version/s: 0.15.0

> StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2071
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2071
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.14.3
>            Reporter: lohit vijayarenu
>            Assignee: lohit vijayarenu
>             Fix For: 0.15.0
>
>         Attachments: HADOOP-2071-1.patch, HADOOP-2071-2.patch
>
>
> In hadoop 0.14, using -inputreader StreamXmlRecordReader  for streaming jobs throw 
> java.io.IOException: Mark/reset exception in hadoop 0.14
> This looks to be related to (https://issues.apache.org/jira/browse/HADOOP-2067).
> <stack trace>
> Caused by: java.io.IOException: Mark/reset not supported
> 	at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353)
> 	at java.io.FilterInputStream.reset(FilterInputStream.java:200)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX
> mlRecordReader.java:289)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream
> XmlRecordReader.java:118)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str
> eamXmlRecordReader.java:111)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader
> .java:73)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav
> a:63)
> </stack trace>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2071) StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541041 ] 

Hudson commented on HADOOP-2071:
--------------------------------

Integrated in Hadoop-Nightly #297 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/297/])

> StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2071
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2071
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.14.3
>            Reporter: lohit vijayarenu
>            Assignee: lohit vijayarenu
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-2071-1.patch, HADOOP-2071-2.patch, HADOOP-2071-3.patch, HADOOP-2071-4.patch, HADOOP-2071-5.patch
>
>
> In hadoop 0.14, using -inputreader StreamXmlRecordReader  for streaming jobs throw 
> java.io.IOException: Mark/reset exception in hadoop 0.14
> This looks to be related to (https://issues.apache.org/jira/browse/HADOOP-2067).
> <stack trace>
> Caused by: java.io.IOException: Mark/reset not supported
> 	at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353)
> 	at java.io.FilterInputStream.reset(FilterInputStream.java:200)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX
> mlRecordReader.java:289)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream
> XmlRecordReader.java:118)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str
> eamXmlRecordReader.java:111)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader
> .java:73)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav
> a:63)
> </stack trace>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2071) StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12535977 ] 

Milind Bhandarkar commented on HADOOP-2071:
-------------------------------------------

Code reviewed:

-1.

the readimit argument for mark is not honored in these changes. If one calls reset after more than readlimit bytes have been read after mark, that reset is supposed to throw IOException.

> StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2071
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2071
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.14.3
>            Reporter: lohit vijayarenu
>            Assignee: lohit vijayarenu
>         Attachments: HADOOP-2071-1.patch
>
>
> In hadoop 0.14, using -inputreader StreamXmlRecordReader  for streaming jobs throw 
> java.io.IOException: Mark/reset exception in hadoop 0.14
> This looks to be related to (https://issues.apache.org/jira/browse/HADOOP-2067).
> <stack trace>
> Caused by: java.io.IOException: Mark/reset not supported
> 	at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353)
> 	at java.io.FilterInputStream.reset(FilterInputStream.java:200)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX
> mlRecordReader.java:289)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream
> XmlRecordReader.java:118)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str
> eamXmlRecordReader.java:111)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader
> .java:73)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav
> a:63)
> </stack trace>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2071) StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537751 ] 

Milind Bhandarkar commented on HADOOP-2071:
-------------------------------------------

+1. code reviewed.

> StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2071
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2071
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.14.3
>            Reporter: lohit vijayarenu
>            Assignee: lohit vijayarenu
>             Fix For: 0.15.0
>
>         Attachments: HADOOP-2071-1.patch, HADOOP-2071-2.patch
>
>
> In hadoop 0.14, using -inputreader StreamXmlRecordReader  for streaming jobs throw 
> java.io.IOException: Mark/reset exception in hadoop 0.14
> This looks to be related to (https://issues.apache.org/jira/browse/HADOOP-2067).
> <stack trace>
> Caused by: java.io.IOException: Mark/reset not supported
> 	at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353)
> 	at java.io.FilterInputStream.reset(FilterInputStream.java:200)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX
> mlRecordReader.java:289)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream
> XmlRecordReader.java:118)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str
> eamXmlRecordReader.java:111)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader
> .java:73)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav
> a:63)
> </stack trace>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2071) StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539201 ] 

Hadoop QA commented on HADOOP-2071:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12368778/HADOOP-2071-4.patch
against trunk revision r590273.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs -1.  The patch appears to introduce 2 new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests -1.  The patch failed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1043/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1043/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1043/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1043/console

This message is automatically generated.

> StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2071
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2071
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.14.3
>            Reporter: lohit vijayarenu
>            Assignee: lohit vijayarenu
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-2071-1.patch, HADOOP-2071-2.patch, HADOOP-2071-3.patch, HADOOP-2071-4.patch
>
>
> In hadoop 0.14, using -inputreader StreamXmlRecordReader  for streaming jobs throw 
> java.io.IOException: Mark/reset exception in hadoop 0.14
> This looks to be related to (https://issues.apache.org/jira/browse/HADOOP-2067).
> <stack trace>
> Caused by: java.io.IOException: Mark/reset not supported
> 	at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353)
> 	at java.io.FilterInputStream.reset(FilterInputStream.java:200)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX
> mlRecordReader.java:289)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream
> XmlRecordReader.java:118)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str
> eamXmlRecordReader.java:111)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader
> .java:73)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav
> a:63)
> </stack trace>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2071) StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Milind Bhandarkar updated HADOOP-2071:
--------------------------------------

    Status: Patch Available  (was: Open)

Making patch available.

> StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2071
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2071
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.14.3
>            Reporter: lohit vijayarenu
>            Assignee: lohit vijayarenu
>             Fix For: 0.15.0
>
>         Attachments: HADOOP-2071-1.patch, HADOOP-2071-2.patch
>
>
> In hadoop 0.14, using -inputreader StreamXmlRecordReader  for streaming jobs throw 
> java.io.IOException: Mark/reset exception in hadoop 0.14
> This looks to be related to (https://issues.apache.org/jira/browse/HADOOP-2067).
> <stack trace>
> Caused by: java.io.IOException: Mark/reset not supported
> 	at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353)
> 	at java.io.FilterInputStream.reset(FilterInputStream.java:200)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX
> mlRecordReader.java:289)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream
> XmlRecordReader.java:118)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str
> eamXmlRecordReader.java:111)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader
> .java:73)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav
> a:63)
> </stack trace>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2071) StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12535980 ] 

Milind Bhandarkar commented on HADOOP-2071:
-------------------------------------------

I think we should use wrap InputStream in_ in java.io.BufferedInputStream, as Raghu suggested, and keep the mark/reset based impl of StreamXmlRecordReader.

> StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2071
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2071
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.14.3
>            Reporter: lohit vijayarenu
>            Assignee: lohit vijayarenu
>         Attachments: HADOOP-2071-1.patch
>
>
> In hadoop 0.14, using -inputreader StreamXmlRecordReader  for streaming jobs throw 
> java.io.IOException: Mark/reset exception in hadoop 0.14
> This looks to be related to (https://issues.apache.org/jira/browse/HADOOP-2067).
> <stack trace>
> Caused by: java.io.IOException: Mark/reset not supported
> 	at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353)
> 	at java.io.FilterInputStream.reset(FilterInputStream.java:200)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX
> mlRecordReader.java:289)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream
> XmlRecordReader.java:118)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str
> eamXmlRecordReader.java:111)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader
> .java:73)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav
> a:63)
> </stack trace>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2071) StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-2071:
----------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this. Thanks, Lohit!

> StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2071
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2071
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.14.3
>            Reporter: lohit vijayarenu
>            Assignee: lohit vijayarenu
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-2071-1.patch, HADOOP-2071-2.patch, HADOOP-2071-3.patch, HADOOP-2071-4.patch, HADOOP-2071-5.patch
>
>
> In hadoop 0.14, using -inputreader StreamXmlRecordReader  for streaming jobs throw 
> java.io.IOException: Mark/reset exception in hadoop 0.14
> This looks to be related to (https://issues.apache.org/jira/browse/HADOOP-2067).
> <stack trace>
> Caused by: java.io.IOException: Mark/reset not supported
> 	at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353)
> 	at java.io.FilterInputStream.reset(FilterInputStream.java:200)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX
> mlRecordReader.java:289)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream
> XmlRecordReader.java:118)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str
> eamXmlRecordReader.java:111)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader
> .java:73)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav
> a:63)
> </stack trace>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2071) StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536008 ] 

Raghu Angadi commented on HADOOP-2071:
--------------------------------------

bq. the readimit argument for mark is not honored in these changes. If one calls reset after more than readlimit bytes have been read after mark, that reset is supposed to throw IOException.

We can just keep track of how many bytes we read and if it is larger than readlimit, we can throw an IOException, if we want to keep that behavior. Actually we can just throw an exception if there is no record found within readlimit (instead of reading till there a match or EOF).

Lohit and I looked the code around and it seems to seek-back pretty heavily (pretty much for every record). Seeking back is pretty inefficient in DFS. It throws away current buffers (both app and TCP) and starts a new connection in most cases. The current patch does not make this situation any worse. I wonder what the typical size of these records is..

One problem with using BufferedInputStream() is that current code uses getPos() and seek() in many place which is specific to FSDataInputStream. So it will need more changes to manage it.

> StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2071
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2071
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.14.3
>            Reporter: lohit vijayarenu
>            Assignee: lohit vijayarenu
>         Attachments: HADOOP-2071-1.patch
>
>
> In hadoop 0.14, using -inputreader StreamXmlRecordReader  for streaming jobs throw 
> java.io.IOException: Mark/reset exception in hadoop 0.14
> This looks to be related to (https://issues.apache.org/jira/browse/HADOOP-2067).
> <stack trace>
> Caused by: java.io.IOException: Mark/reset not supported
> 	at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353)
> 	at java.io.FilterInputStream.reset(FilterInputStream.java:200)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX
> mlRecordReader.java:289)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream
> XmlRecordReader.java:118)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str
> eamXmlRecordReader.java:111)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader
> .java:73)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav
> a:63)
> </stack trace>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2071) StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14

Posted by "lohit vijayarenu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

lohit vijayarenu updated HADOOP-2071:
-------------------------------------

    Attachment: HADOOP-2071-5.patch

looks like unrelated contrib test failed. But there were 2 findbugs warning which I have fixed and attaching new patch

> StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2071
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2071
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.14.3
>            Reporter: lohit vijayarenu
>            Assignee: lohit vijayarenu
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-2071-1.patch, HADOOP-2071-2.patch, HADOOP-2071-3.patch, HADOOP-2071-4.patch, HADOOP-2071-5.patch
>
>
> In hadoop 0.14, using -inputreader StreamXmlRecordReader  for streaming jobs throw 
> java.io.IOException: Mark/reset exception in hadoop 0.14
> This looks to be related to (https://issues.apache.org/jira/browse/HADOOP-2067).
> <stack trace>
> Caused by: java.io.IOException: Mark/reset not supported
> 	at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353)
> 	at java.io.FilterInputStream.reset(FilterInputStream.java:200)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX
> mlRecordReader.java:289)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream
> XmlRecordReader.java:118)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str
> eamXmlRecordReader.java:111)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader
> .java:73)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav
> a:63)
> </stack trace>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2071) StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14

Posted by "lohit vijayarenu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

lohit vijayarenu updated HADOOP-2071:
-------------------------------------

    Assignee: lohit vijayarenu

> StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2071
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2071
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.14.3
>            Reporter: lohit vijayarenu
>            Assignee: lohit vijayarenu
>         Attachments: HADOOP-2071-1.patch
>
>
> In hadoop 0.14, using -inputreader StreamXmlRecordReader  for streaming jobs throw 
> java.io.IOException: Mark/reset exception in hadoop 0.14
> This looks to be related to (https://issues.apache.org/jira/browse/HADOOP-2067).
> <stack trace>
> Caused by: java.io.IOException: Mark/reset not supported
> 	at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353)
> 	at java.io.FilterInputStream.reset(FilterInputStream.java:200)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX
> mlRecordReader.java:289)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream
> XmlRecordReader.java:118)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str
> eamXmlRecordReader.java:111)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader
> .java:73)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav
> a:63)
> </stack trace>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2071) StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14

Posted by "lohit vijayarenu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

lohit vijayarenu updated HADOOP-2071:
-------------------------------------

    Status: Open  (was: Patch Available)

Thanks Raghu. I will look into this case and resubmit new one. Canceling the patch.

> StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2071
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2071
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.14.3
>            Reporter: lohit vijayarenu
>            Assignee: lohit vijayarenu
>             Fix For: 0.15.0
>
>         Attachments: HADOOP-2071-1.patch, HADOOP-2071-2.patch
>
>
> In hadoop 0.14, using -inputreader StreamXmlRecordReader  for streaming jobs throw 
> java.io.IOException: Mark/reset exception in hadoop 0.14
> This looks to be related to (https://issues.apache.org/jira/browse/HADOOP-2067).
> <stack trace>
> Caused by: java.io.IOException: Mark/reset not supported
> 	at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353)
> 	at java.io.FilterInputStream.reset(FilterInputStream.java:200)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX
> mlRecordReader.java:289)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream
> XmlRecordReader.java:118)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str
> eamXmlRecordReader.java:111)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader
> .java:73)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav
> a:63)
> </stack trace>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.