You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by GitBox <gi...@apache.org> on 2020/05/18 19:58:52 UTC

[GitHub] [samza] lakshmi-manasa-g opened a new pull request #1362: SAMZA-2526: Azure blob system producer: do not commit blobs if avro DataFileWriter.close fails

lakshmi-manasa-g opened a new pull request #1362:
URL: https://github.com/apache/samza/pull/1362


   **Bug fix**
   **Symptom:** No visible symptoms seen yet. Potential symptom is invalid blobs created.
   
   **Cause:** When Avro's DataFileWriter.close fails, blob is committed with uploaded blocks which means the sync marker flushed by DataFileWriter.close is absent potentially causing invalid blobs.
   
   **Changes:** No commit of blob if DataFileWriter.close fails. This means uploaded blocks are discarded in this scenario and user of the SystemProducer should catch this exception and retry to avoid data loss.
   
   **API changes:** Behavior change: Blob not committed if DataFileWriter.close fails discarding all uploaded blocks.
   
   **Upgrade Instructions:** Catch exceptions arising out of AzureBlobSystemProducer.flush and retry messages since previous flush. Do not advance checkpoint if flush fails.
   
   **Usage Instructions:** None
   
   **Tests:** unit test updated.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [samza] lakshmi-manasa-g commented on pull request #1362: SAMZA-2526: Azure blob system producer: do not commit blobs if avro DataFileWriter.close fails

Posted by GitBox <gi...@apache.org>.
lakshmi-manasa-g commented on pull request #1362:
URL: https://github.com/apache/samza/pull/1362#issuecomment-631560831


   @bkonold yes, your understanding is correct.
   
   Repair an invalid blob - yes it is possible to repair atleast partially using higher versions of Avro (i believe 1.9 onwards). Since we have not encountered such a blob I cant say for sure how much of the blob can be recovered.
   
   For the role of the last few bytes (aka synchronization marker) flushed by DFW - here is the documentation from DFW itself
   
   >  Data written by DFW is grouped into blocks (not to confuse with Azure blocks uploaded).  A synchronization marker is written between blocks, so that files may be split.
   
   DFW.close calls DFW.flush which inturn calls DFW.sync() whose doc says
   >  Forces the end of the current block, emitting a synchronization marker. By default, this will also flush the block to the stream (AzureBlobOutputStream = ABOS in our case).
   
   I just realized when responding to your comment that AvroWriter.close is always preceeded by AvroWriter.flush  - that is the requirement for close as per the documentation of AvroWriter and that is how it is used in the SystemProducer right now. Now, AvroWriter.flush also calls DFW.flush. Hence DFW.flush is called twice in succession and hence the second sync() becomes a no-op as there are no DFW blocks to force end of. This could very well be the reason why we have not seen invalid blobs due to this issue. 
   
   Let me mull over this a bit more and get back. 
   Part of me still says this PR is necessary for 2 reasons: it is essential to discard the blob and to not rely on DFW impl details (DFW.close calling flush or DFW.flush calling DFW.sync). And that since the exception will anyways be bubbled even if we commit the blob despite this exception - System producer user would most likely have re-tried the messages for this blob resulting in an another blob with same messages getting created. Two blobs instead of one does not break our promise but maybe undesirable. 
   
   and another part of me says that this PR is not a must-have but a nice to-have: current impl of DFW and AvroWriter are working well enough to ensure that DFW.flush inside DFW.close is kind of no-op. So essentially DFW.close is doing only ABOS.close i think. So, in case of failure of DFW.close doing ABOS.close might not be totally bad. However, this is very dependent on the current impl of DFW and the binaryEncoders it uses to write to ABOS. 
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [samza] cameronlee314 merged pull request #1362: SAMZA-2526: Azure blob system producer: do not commit blobs if avro DataFileWriter.close fails

Posted by GitBox <gi...@apache.org>.
cameronlee314 merged pull request #1362:
URL: https://github.com/apache/samza/pull/1362


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [samza] bkonold commented on pull request #1362: SAMZA-2526: Azure blob system producer: do not commit blobs if avro DataFileWriter.close fails

Posted by GitBox <gi...@apache.org>.
bkonold commented on pull request #1362:
URL: https://github.com/apache/samza/pull/1362#issuecomment-631758857


   @lakshmi-manasa-g I agree with your suggestion that relying on implementation details is unideal; we can rely on contracts that are established through the documentation of the library but implementations can and will change as you imply.
   
   If possible let us speak on this offline; I'd like to better understand the flow here and how we can most clearly delineate responsibility between flush & close.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [samza] lakshmi-manasa-g commented on pull request #1362: SAMZA-2526: Azure blob system producer: do not commit blobs if avro DataFileWriter.close fails

Posted by GitBox <gi...@apache.org>.
lakshmi-manasa-g commented on pull request #1362:
URL: https://github.com/apache/samza/pull/1362#issuecomment-631274573


   @bkonold 
   
   Let me start with what flush and close mean for AzureBlobWriter interface. The interface expectation is that after flush all of the data sent so far is uploaded as a block to Azure and close means the blob is sealed (aka created or committed) with all the blocks so far. Thus flush could be a more frequently occurring operation whereas close is the final operation after which the writer can not opened to write any more messages.
   So the flow is something like this 
                               writer.write (several times) --> writer.flush (can go back to write) --> writer.close
                                              
   Now coming to AzureBlobAvroWriter impl - it is the same meaning as the interface expects. Only difference arises due to the use of Avro's DataFileWriter. Used as follows
               avroWriter --> DataFileWriter (avro piece) --> AzureBlobOutputStream (uploads/commits to Azure)
   
   AzureBlobAvroWriter.close has to call DataFileWriter.close which internally calls flush and that writes some bytes (indicating end of block kind of stuff) to the output stream which puts it into the last block of the blob.  
   
   Now finally coming to the current issue: earlier, the blob was committed by calling AzureBlobOutputSteam.close even when DataFileWriter.close fails. which meant blocks uploaded so far will be used to make a blob - but this might miss the last bytes from DFW leading to an invalid/unreadable blob. The better thing to do is to discard the blob so that the user of the system producer has a chance to re-try all the messages for the blob.
   
   I know this is a very long response. But hope it helps in clarifying


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [samza] bkonold commented on pull request #1362: SAMZA-2526: Azure blob system producer: do not commit blobs if avro DataFileWriter.close fails

Posted by GitBox <gi...@apache.org>.
bkonold commented on pull request #1362:
URL: https://github.com/apache/samza/pull/1362#issuecomment-631259676


   @lakshmi-manasa-g Can you explain the relationship between flush and close within AzureBlobAvroWriter?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org