You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Luis Filipe Nassif (Created) (JIRA)" <ji...@apache.org> on 2012/03/28 01:34:29 UTC

[jira] [Created] (TIKA-885) Possible ConcurrentModificationException while accessing Metadata produced by ParsingReader

Possible ConcurrentModificationException while accessing Metadata produced by ParsingReader
-------------------------------------------------------------------------------------------

                 Key: TIKA-885
                 URL: https://issues.apache.org/jira/browse/TIKA-885
             Project: Tika
          Issue Type: Improvement
          Components: metadata, parser
    Affects Versions: 1.0
         Environment: jre 1.6_25 x64 and Windows7 Enterprise x64
            Reporter: Luis Filipe Nassif
            Priority: Minor


Oracle PipedReader and PipedWriter classes have a bug that do not allow them to execute concurrently, because they notify each other only when the pipe is full or empty, and do not after a char is read or written to the pipe. So i modified ParsingReader to use modified versions of PipedReader and PipedWriter, similar to gnu versions of them, that work concurrently. However, sometimes and with certain files, i am getting the following error:

java.util.ConcurrentModificationException
                at java.util.HashMap$HashIterator.nextEntry(Unknown Source)
                at java.util.HashMap$KeyIterator.next(Unknown Source)
                at java.util.AbstractCollection.toArray(Unknown Source)
                at org.apache.tika.metadata.Metadata.names(Metadata.java:146)

It is because the ParsingReader.ParsingTask thread is writing metadata while it is being read by the ParsingReader thread, with files containing metadata beyond its initial bytes. It will not occur with the current implementation, because java PipedReader and PipedWriter block each other, what is a performance bug that affect ParsingReader, but they could be fixed in a future java release. I think it would be a defensive approach to turn access to the private Metadata.metadata Map synchronized, what could avoid a possible future problem using ParsingReader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-885) Possible ConcurrentModificationException while accessing Metadata produced by ParsingReader

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409102#comment-13409102 ] 

Jukka Zitting commented on TIKA-885:
------------------------------------

Hmm, that is a good point! I guess the best way to solve this, apart from making Metadata fully synchronized, would be to pass a copy of the given metadata object to the parsing process in the background thread, and then explicitly copy any updates back to the original Metadata instance when the client calls read() or other methods on the reader instance. A bit like how we handle the transmission of an exception across the threads.
                
> Possible ConcurrentModificationException while accessing Metadata produced by ParsingReader
> -------------------------------------------------------------------------------------------
>
>                 Key: TIKA-885
>                 URL: https://issues.apache.org/jira/browse/TIKA-885
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata, parser
>    Affects Versions: 1.0
>         Environment: jre 1.6_25 x64 and Windows7 Enterprise x64
>            Reporter: Luis Filipe Nassif
>            Priority: Minor
>              Labels: patch
>
> Oracle PipedReader and PipedWriter classes have a bug that do not allow them to execute concurrently, because they notify each other only when the pipe is full or empty, and do not after a char is read or written to the pipe. So i modified ParsingReader to use modified versions of PipedReader and PipedWriter, similar to gnu versions of them, that work concurrently. However, sometimes and with certain files, i am getting the following error:
> java.util.ConcurrentModificationException
>                 at java.util.HashMap$HashIterator.nextEntry(Unknown Source)
>                 at java.util.HashMap$KeyIterator.next(Unknown Source)
>                 at java.util.AbstractCollection.toArray(Unknown Source)
>                 at org.apache.tika.metadata.Metadata.names(Metadata.java:146)
> It is because the ParsingReader.ParsingTask thread is writing metadata while it is being read by the ParsingReader thread, with files containing metadata beyond its initial bytes. It will not occur with the current implementation, because java PipedReader and PipedWriter block each other, what is a performance bug that affect ParsingReader, but they could be fixed in a future java release. I think it would be a defensive approach to turn access to the private Metadata.metadata Map synchronized, what could avoid a possible future problem using ParsingReader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-885) Possible ConcurrentModificationException while accessing Metadata produced by ParsingReader

Posted by "Luis Filipe Nassif (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475865#comment-13475865 ] 

Luis Filipe Nassif commented on TIKA-885:
-----------------------------------------

Ok, I got the idea. I think it will solve the problem. Setting a flag on metadata changes and testing for it on reads and writes would save unnecessarily copies and synchronization.

I will open a new issue describing the improvement on PipedReader and PipedWriter.
                
> Possible ConcurrentModificationException while accessing Metadata produced by ParsingReader
> -------------------------------------------------------------------------------------------
>
>                 Key: TIKA-885
>                 URL: https://issues.apache.org/jira/browse/TIKA-885
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata, parser
>    Affects Versions: 1.0
>         Environment: jre 1.6_25 x64 and Windows7 Enterprise x64
>            Reporter: Luis Filipe Nassif
>            Priority: Minor
>              Labels: patch
>
> Oracle PipedReader and PipedWriter classes have a bug that do not allow them to execute concurrently, because they notify each other only when the pipe is full or empty, and do not after a char is read or written to the pipe. So i modified ParsingReader to use modified versions of PipedReader and PipedWriter, similar to gnu versions of them, that work concurrently. However, sometimes and with certain files, i am getting the following error:
> java.util.ConcurrentModificationException
>                 at java.util.HashMap$HashIterator.nextEntry(Unknown Source)
>                 at java.util.HashMap$KeyIterator.next(Unknown Source)
>                 at java.util.AbstractCollection.toArray(Unknown Source)
>                 at org.apache.tika.metadata.Metadata.names(Metadata.java:146)
> It is because the ParsingReader.ParsingTask thread is writing metadata while it is being read by the ParsingReader thread, with files containing metadata beyond its initial bytes. It will not occur with the current implementation, because java PipedReader and PipedWriter block each other, what is a performance bug that affect ParsingReader, but they could be fixed in a future java release. I think it would be a defensive approach to turn access to the private Metadata.metadata Map synchronized, what could avoid a possible future problem using ParsingReader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TIKA-885) Possible ConcurrentModificationException while accessing Metadata produced by ParsingReader

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426842#comment-13426842 ] 

Jukka Zitting commented on TIKA-885:
------------------------------------

What I had in mind was something like a {{Metadata.copyFrom(Metadata)}} method that would copy all metadata from one instance to another. We'd then have three {{Metadata}} instances, one for the client, one for the parser and a shared one for passing updates from the parser to the client. Each {{write()}} in the background parser would do something like:

{code}
synchronized (sharedMetadata) {
    sharedMetadata.copyFrom(parserMetadata);
}
{code}

... and each {{read()}} by the client would do:

{code}
synchronized (sharedMetadata) {
    clientMetadata.copyFrom(sharedMetadata);
}
{code}

It's not terribly elegant, but should avoid the need to make all {{Metadata}} instances thread-safe.

bq. customized versions of PipedReader and PipedWriter classes that work concurrently

I'm not sure I understand. Perhaps you could describe the idea in more detail either on the dev@ list or in a separate improvement issue.
                
> Possible ConcurrentModificationException while accessing Metadata produced by ParsingReader
> -------------------------------------------------------------------------------------------
>
>                 Key: TIKA-885
>                 URL: https://issues.apache.org/jira/browse/TIKA-885
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata, parser
>    Affects Versions: 1.0
>         Environment: jre 1.6_25 x64 and Windows7 Enterprise x64
>            Reporter: Luis Filipe Nassif
>            Priority: Minor
>              Labels: patch
>
> Oracle PipedReader and PipedWriter classes have a bug that do not allow them to execute concurrently, because they notify each other only when the pipe is full or empty, and do not after a char is read or written to the pipe. So i modified ParsingReader to use modified versions of PipedReader and PipedWriter, similar to gnu versions of them, that work concurrently. However, sometimes and with certain files, i am getting the following error:
> java.util.ConcurrentModificationException
>                 at java.util.HashMap$HashIterator.nextEntry(Unknown Source)
>                 at java.util.HashMap$KeyIterator.next(Unknown Source)
>                 at java.util.AbstractCollection.toArray(Unknown Source)
>                 at org.apache.tika.metadata.Metadata.names(Metadata.java:146)
> It is because the ParsingReader.ParsingTask thread is writing metadata while it is being read by the ParsingReader thread, with files containing metadata beyond its initial bytes. It will not occur with the current implementation, because java PipedReader and PipedWriter block each other, what is a performance bug that affect ParsingReader, but they could be fixed in a future java release. I think it would be a defensive approach to turn access to the private Metadata.metadata Map synchronized, what could avoid a possible future problem using ParsingReader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Comment Edited] (TIKA-885) Possible ConcurrentModificationException while accessing Metadata produced by ParsingReader

Posted by "Luis Filipe Nassif (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475865#comment-13475865 ] 

Luis Filipe Nassif edited comment on TIKA-885 at 10/16/12 12:46 PM:
--------------------------------------------------------------------

Ok, I got the idea. I think it will solve the problem. Setting a flag on metadata changes and testing for it on reads and writes would save unnecessarily copies and synchronization.

I will open a new issue describing the improvement on PipedReader and PipedWriter to use with ParsingReader.
                
      was (Author: lfcnassif):
    Ok, I got the idea. I think it will solve the problem. Setting a flag on metadata changes and testing for it on reads and writes would save unnecessarily copies and synchronization.

I have opened a new issue TIKA-1007 describing the improvement on PipedReader and PipedWriter to use with ParsingReader.
                  
> Possible ConcurrentModificationException while accessing Metadata produced by ParsingReader
> -------------------------------------------------------------------------------------------
>
>                 Key: TIKA-885
>                 URL: https://issues.apache.org/jira/browse/TIKA-885
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata, parser
>    Affects Versions: 1.0
>         Environment: jre 1.6_25 x64 and Windows7 Enterprise x64
>            Reporter: Luis Filipe Nassif
>            Priority: Minor
>              Labels: patch
>
> Oracle PipedReader and PipedWriter classes have a bug that do not allow them to execute concurrently, because they notify each other only when the pipe is full or empty, and do not after a char is read or written to the pipe. So i modified ParsingReader to use modified versions of PipedReader and PipedWriter, similar to gnu versions of them, that work concurrently. However, sometimes and with certain files, i am getting the following error:
> java.util.ConcurrentModificationException
>                 at java.util.HashMap$HashIterator.nextEntry(Unknown Source)
>                 at java.util.HashMap$KeyIterator.next(Unknown Source)
>                 at java.util.AbstractCollection.toArray(Unknown Source)
>                 at org.apache.tika.metadata.Metadata.names(Metadata.java:146)
> It is because the ParsingReader.ParsingTask thread is writing metadata while it is being read by the ParsingReader thread, with files containing metadata beyond its initial bytes. It will not occur with the current implementation, because java PipedReader and PipedWriter block each other, what is a performance bug that affect ParsingReader, but they could be fixed in a future java release. I think it would be a defensive approach to turn access to the private Metadata.metadata Map synchronized, what could avoid a possible future problem using ParsingReader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TIKA-885) Possible ConcurrentModificationException while accessing Metadata produced by ParsingReader

Posted by "Luis Filipe Nassif (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426257#comment-13426257 ] 

Luis Filipe Nassif commented on TIKA-885:
-----------------------------------------

But how to track updates to the metadata copy without changing parsers? Iterating through the metadata copy in the reader thread can cause the same ConcurrentModificationException problem.

Although not solving the above problem, i suggest using customized versions of PipedReader and PipedWriter classes that work concurrently. It can at best double the speed of ParsingReader on multicore machines. I can open a new improvement for this, what do you think?
                
> Possible ConcurrentModificationException while accessing Metadata produced by ParsingReader
> -------------------------------------------------------------------------------------------
>
>                 Key: TIKA-885
>                 URL: https://issues.apache.org/jira/browse/TIKA-885
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata, parser
>    Affects Versions: 1.0
>         Environment: jre 1.6_25 x64 and Windows7 Enterprise x64
>            Reporter: Luis Filipe Nassif
>            Priority: Minor
>              Labels: patch
>
> Oracle PipedReader and PipedWriter classes have a bug that do not allow them to execute concurrently, because they notify each other only when the pipe is full or empty, and do not after a char is read or written to the pipe. So i modified ParsingReader to use modified versions of PipedReader and PipedWriter, similar to gnu versions of them, that work concurrently. However, sometimes and with certain files, i am getting the following error:
> java.util.ConcurrentModificationException
>                 at java.util.HashMap$HashIterator.nextEntry(Unknown Source)
>                 at java.util.HashMap$KeyIterator.next(Unknown Source)
>                 at java.util.AbstractCollection.toArray(Unknown Source)
>                 at org.apache.tika.metadata.Metadata.names(Metadata.java:146)
> It is because the ParsingReader.ParsingTask thread is writing metadata while it is being read by the ParsingReader thread, with files containing metadata beyond its initial bytes. It will not occur with the current implementation, because java PipedReader and PipedWriter block each other, what is a performance bug that affect ParsingReader, but they could be fixed in a future java release. I think it would be a defensive approach to turn access to the private Metadata.metadata Map synchronized, what could avoid a possible future problem using ParsingReader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Comment Edited] (TIKA-885) Possible ConcurrentModificationException while accessing Metadata produced by ParsingReader

Posted by "Luis Filipe Nassif (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475865#comment-13475865 ] 

Luis Filipe Nassif edited comment on TIKA-885 at 10/16/12 12:44 PM:
--------------------------------------------------------------------

Ok, I got the idea. I think it will solve the problem. Setting a flag on metadata changes and testing for it on reads and writes would save unnecessarily copies and synchronization.

I have opened a new issue TIKA-1007 describing the improvement on PipedReader and PipedWriter to use with ParsingReader.
                
      was (Author: lfcnassif):
    Ok, I got the idea. I think it will solve the problem. Setting a flag on metadata changes and testing for it on reads and writes would save unnecessarily copies and synchronization.

I will open a new issue describing the improvement on PipedReader and PipedWriter.
                  
> Possible ConcurrentModificationException while accessing Metadata produced by ParsingReader
> -------------------------------------------------------------------------------------------
>
>                 Key: TIKA-885
>                 URL: https://issues.apache.org/jira/browse/TIKA-885
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata, parser
>    Affects Versions: 1.0
>         Environment: jre 1.6_25 x64 and Windows7 Enterprise x64
>            Reporter: Luis Filipe Nassif
>            Priority: Minor
>              Labels: patch
>
> Oracle PipedReader and PipedWriter classes have a bug that do not allow them to execute concurrently, because they notify each other only when the pipe is full or empty, and do not after a char is read or written to the pipe. So i modified ParsingReader to use modified versions of PipedReader and PipedWriter, similar to gnu versions of them, that work concurrently. However, sometimes and with certain files, i am getting the following error:
> java.util.ConcurrentModificationException
>                 at java.util.HashMap$HashIterator.nextEntry(Unknown Source)
>                 at java.util.HashMap$KeyIterator.next(Unknown Source)
>                 at java.util.AbstractCollection.toArray(Unknown Source)
>                 at org.apache.tika.metadata.Metadata.names(Metadata.java:146)
> It is because the ParsingReader.ParsingTask thread is writing metadata while it is being read by the ParsingReader thread, with files containing metadata beyond its initial bytes. It will not occur with the current implementation, because java PipedReader and PipedWriter block each other, what is a performance bug that affect ParsingReader, but they could be fixed in a future java release. I think it would be a defensive approach to turn access to the private Metadata.metadata Map synchronized, what could avoid a possible future problem using ParsingReader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TIKA-885) Possible ConcurrentModificationException while accessing Metadata produced by ParsingReader

Posted by "Luis Filipe Nassif (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408811#comment-13408811 ] 

Luis Filipe Nassif commented on TIKA-885:
-----------------------------------------

Probably this problem with metadata access in ParsingReader thread while being changed by ParsingReader.ParsingTask thread will happen easily with openJDK, because openJDK PipedReader and PipedWriter work concurrently/correctly.
                
> Possible ConcurrentModificationException while accessing Metadata produced by ParsingReader
> -------------------------------------------------------------------------------------------
>
>                 Key: TIKA-885
>                 URL: https://issues.apache.org/jira/browse/TIKA-885
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata, parser
>    Affects Versions: 1.0
>         Environment: jre 1.6_25 x64 and Windows7 Enterprise x64
>            Reporter: Luis Filipe Nassif
>            Priority: Minor
>              Labels: patch
>
> Oracle PipedReader and PipedWriter classes have a bug that do not allow them to execute concurrently, because they notify each other only when the pipe is full or empty, and do not after a char is read or written to the pipe. So i modified ParsingReader to use modified versions of PipedReader and PipedWriter, similar to gnu versions of them, that work concurrently. However, sometimes and with certain files, i am getting the following error:
> java.util.ConcurrentModificationException
>                 at java.util.HashMap$HashIterator.nextEntry(Unknown Source)
>                 at java.util.HashMap$KeyIterator.next(Unknown Source)
>                 at java.util.AbstractCollection.toArray(Unknown Source)
>                 at org.apache.tika.metadata.Metadata.names(Metadata.java:146)
> It is because the ParsingReader.ParsingTask thread is writing metadata while it is being read by the ParsingReader thread, with files containing metadata beyond its initial bytes. It will not occur with the current implementation, because java PipedReader and PipedWriter block each other, what is a performance bug that affect ParsingReader, but they could be fixed in a future java release. I think it would be a defensive approach to turn access to the private Metadata.metadata Map synchronized, what could avoid a possible future problem using ParsingReader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-885) Possible ConcurrentModificationException while accessing Metadata produced by ParsingReader

Posted by "Luis Filipe Nassif (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476966#comment-13476966 ] 

Luis Filipe Nassif commented on TIKA-885:
-----------------------------------------

Opened TIKA-1007 describing the improvement of ParsingReader.
                
> Possible ConcurrentModificationException while accessing Metadata produced by ParsingReader
> -------------------------------------------------------------------------------------------
>
>                 Key: TIKA-885
>                 URL: https://issues.apache.org/jira/browse/TIKA-885
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata, parser
>    Affects Versions: 1.0
>         Environment: jre 1.6_25 x64 and Windows7 Enterprise x64
>            Reporter: Luis Filipe Nassif
>            Priority: Minor
>              Labels: patch
>
> Oracle PipedReader and PipedWriter classes have a bug that do not allow them to execute concurrently, because they notify each other only when the pipe is full or empty, and do not after a char is read or written to the pipe. So i modified ParsingReader to use modified versions of PipedReader and PipedWriter, similar to gnu versions of them, that work concurrently. However, sometimes and with certain files, i am getting the following error:
> java.util.ConcurrentModificationException
>                 at java.util.HashMap$HashIterator.nextEntry(Unknown Source)
>                 at java.util.HashMap$KeyIterator.next(Unknown Source)
>                 at java.util.AbstractCollection.toArray(Unknown Source)
>                 at org.apache.tika.metadata.Metadata.names(Metadata.java:146)
> It is because the ParsingReader.ParsingTask thread is writing metadata while it is being read by the ParsingReader thread, with files containing metadata beyond its initial bytes. It will not occur with the current implementation, because java PipedReader and PipedWriter block each other, what is a performance bug that affect ParsingReader, but they could be fixed in a future java release. I think it would be a defensive approach to turn access to the private Metadata.metadata Map synchronized, what could avoid a possible future problem using ParsingReader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (TIKA-885) Possible ConcurrentModificationException while accessing Metadata produced by ParsingReader

Posted by "Luis Filipe Nassif (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408811#comment-13408811 ] 

Luis Filipe Nassif edited comment on TIKA-885 at 7/8/12 12:02 AM:
------------------------------------------------------------------

Probably this problem with metadata access in ParsingReader thread while being changed by ParsingReader.ParsingTask thread will happen easily with GNU Classpath, because its PipedReader and PipedWriter work concurrently/correctly.
                
      was (Author: lfcnassif):
    Probably this problem with metadata access in ParsingReader thread while being changed by ParsingReader.ParsingTask thread will happen easily with openJDK, because openJDK PipedReader and PipedWriter work concurrently/correctly.
                  
> Possible ConcurrentModificationException while accessing Metadata produced by ParsingReader
> -------------------------------------------------------------------------------------------
>
>                 Key: TIKA-885
>                 URL: https://issues.apache.org/jira/browse/TIKA-885
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata, parser
>    Affects Versions: 1.0
>         Environment: jre 1.6_25 x64 and Windows7 Enterprise x64
>            Reporter: Luis Filipe Nassif
>            Priority: Minor
>              Labels: patch
>
> Oracle PipedReader and PipedWriter classes have a bug that do not allow them to execute concurrently, because they notify each other only when the pipe is full or empty, and do not after a char is read or written to the pipe. So i modified ParsingReader to use modified versions of PipedReader and PipedWriter, similar to gnu versions of them, that work concurrently. However, sometimes and with certain files, i am getting the following error:
> java.util.ConcurrentModificationException
>                 at java.util.HashMap$HashIterator.nextEntry(Unknown Source)
>                 at java.util.HashMap$KeyIterator.next(Unknown Source)
>                 at java.util.AbstractCollection.toArray(Unknown Source)
>                 at org.apache.tika.metadata.Metadata.names(Metadata.java:146)
> It is because the ParsingReader.ParsingTask thread is writing metadata while it is being read by the ParsingReader thread, with files containing metadata beyond its initial bytes. It will not occur with the current implementation, because java PipedReader and PipedWriter block each other, what is a performance bug that affect ParsingReader, but they could be fixed in a future java release. I think it would be a defensive approach to turn access to the private Metadata.metadata Map synchronized, what could avoid a possible future problem using ParsingReader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira