You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by GitBox <gi...@apache.org> on 2020/11/30 13:17:21 UTC

[GitHub] [uima-uimaj] mjunsilo opened a new pull request #88: [UIMA-6295] CAS transportable Java object not serialised or deserialised with compressed binary

mjunsilo opened a new pull request #88:
URL: https://github.com/apache/uima-uimaj/pull/88


   - Commented out second call to UimaSerializable._save_to_cas_data() during initSrcTgtIdMapsAndStrings, since it can cause feature structures that were previously collected for serialization during processIndexedFeatureStructures to be invalidated. Reason is because _save_to_cas_data could create a new feature structure during every call, which then has no valid serialization id, since these were assigned to the previous FS instance in the steps preceding this second save call. It's the new instance id that would be used to serialize the reference, and this ends up getting the value 0, because it is not known by the id mappings.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [uima-uimaj] mjunsilo commented on pull request #88: [UIMA-6295] CAS transportable Java object not serialised or deserialised with compressed binary

Posted by GitBox <gi...@apache.org>.
mjunsilo commented on pull request #88:
URL: https://github.com/apache/uima-uimaj/pull/88#issuecomment-744032273


   There are more serialization options that I could add to the tests, e.g. XCAS, but I also think we should instead create new issues, so that we can get this one closed, so I am fine with merging.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [uima-uimaj] mjunsilo commented on pull request #88: [UIMA-6295] CAS transportable Java object not serialised or deserialised with compressed binary

Posted by GitBox <gi...@apache.org>.
mjunsilo commented on pull request #88:
URL: https://github.com/apache/uima-uimaj/pull/88#issuecomment-738719899


   I think tracking _save_cas_data() is really the only solution, since the other is really a workaround for a bug IMHO, which I already thought about when I realized what the issue was. It requires API clients to follow some very strict protocol in order to work, which would be difficult to explain in documentation, and others would easily waste time on the same issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [uima-uimaj] reckart commented on pull request #88: [UIMA-6295] CAS transportable Java object not serialised or deserialised with compressed binary

Posted by GitBox <gi...@apache.org>.
reckart commented on pull request #88:
URL: https://github.com/apache/uima-uimaj/pull/88#issuecomment-737289745


   I believe you have access to a significant amount of serialized CASes, right? Can you check if they fully survive a load/save/load round trip with the line removed?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [uima-uimaj] mjunsilo commented on pull request #88: [UIMA-6295] CAS transportable Java object not serialised or deserialised with compressed binary

Posted by GitBox <gi...@apache.org>.
mjunsilo commented on pull request #88:
URL: https://github.com/apache/uima-uimaj/pull/88#issuecomment-737283009


   Check. I'll leave it from here, and let me know if you need anything more from my side. I haven't either figured out the reason for the additional call, but maybe something shows up if we start adding the tests for all CasIOUtils supported serializations given by SerialFormat. It's the same test just with different SerialFormat, so that should be easy to extend.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [uima-uimaj] mjunsilo commented on pull request #88: [UIMA-6295] CAS transportable Java object not serialised or deserialised with compressed binary

Posted by GitBox <gi...@apache.org>.
mjunsilo commented on pull request #88:
URL: https://github.com/apache/uima-uimaj/pull/88#issuecomment-741873441


   I now found the reason why the form 4 serializer still fails, which is because _save_to_cas_data() is also called at an even earlier step deep hidden inside the serializer constructor inside the AllFSs class at l. 157. This class doesn't yet track those calls, and the tracker would have to be shared with the other places where _save_to_cas_data() is called. However, the reference to this AllFSs object only seems to exists temporary to construct FS mappings inside the csds instance, so it seems rather impossible to keep track of that the way things are currently connected.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [uima-uimaj] mjunsilo commented on pull request #88: [UIMA-6295] CAS transportable Java object not serialised or deserialised with compressed binary

Posted by GitBox <gi...@apache.org>.
mjunsilo commented on pull request #88:
URL: https://github.com/apache/uima-uimaj/pull/88#issuecomment-738803210


   I finally committed the unit tests, so you can now arrange it how you like. Note that the form 4 test are still failing, since none of us has addressed the issue in this serializer, but after looking briefly at the code, it appears that we can transfer the FSs tracking solution to that serializer as well.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [uima-uimaj] reckart commented on pull request #88: [UIMA-6295] CAS transportable Java object not serialised or deserialised with compressed binary

Posted by GitBox <gi...@apache.org>.
reckart commented on pull request #88:
URL: https://github.com/apache/uima-uimaj/pull/88#issuecomment-741110085


   Thanks for looking into this further!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [uima-uimaj] reckart commented on pull request #88: [UIMA-6295] CAS transportable Java object not serialised or deserialised with compressed binary

Posted by GitBox <gi...@apache.org>.
reckart commented on pull request #88:
URL: https://github.com/apache/uima-uimaj/pull/88#issuecomment-737285941


   Just do what you have in terms of Test into the PR at whatever place seems ok for you and I'll move it around as necessary. Fine by me if it is half Dinge with the Build falling on it. Ok?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [uima-uimaj] mjunsilo commented on pull request #88: [UIMA-6295] CAS transportable Java object not serialised or deserialised with compressed binary

Posted by GitBox <gi...@apache.org>.
mjunsilo commented on pull request #88:
URL: https://github.com/apache/uima-uimaj/pull/88#issuecomment-737288890


   OK, I'll put it into the PR, so you can refactor it after as you find necessary.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [uima-uimaj] reckart commented on pull request #88: [UIMA-6295] CAS transportable Java object not serialised or deserialised with compressed binary

Posted by GitBox <gi...@apache.org>.
reckart commented on pull request #88:
URL: https://github.com/apache/uima-uimaj/pull/88#issuecomment-738336689


   So reading the code I would say that if `reuseInfoProvided` is not set in `BinaryCasSerDes6` line 749, then `processIndexedFeatureStructures()` is never called and in that case the Java objects would never get committed to the CAS if the `_save_to_cas_data()` call is commented out in `initSrcTgtIdMapsAndStrings()`.
   
   A workaround to the issue could be that the `_save_to_cas_data()` in your `FeatureMap` checks if there is already a target FSArray  with the correct size that could be filled and only to create a new array if that is not the case. In this case, the first call would initialize the FSArray if necessary and the second call would not throw it away, just (potentially redundantly) update it's content - I'll see if that helps...
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [uima-uimaj] mjunsilo commented on pull request #88: [UIMA-6295] CAS transportable Java object not serialised or deserialised with compressed binary

Posted by GitBox <gi...@apache.org>.
mjunsilo commented on pull request #88:
URL: https://github.com/apache/uima-uimaj/pull/88#issuecomment-737347325


   Yes, we have a few still in v2, but most have been migrated to v3 by now. AFAIK none of them use yet any types that implement UimaSerializableFSs, but I can do a such test on a mixed set of them. How many would suffice for you?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [uima-uimaj] mjunsilo commented on pull request #88: [UIMA-6295] CAS transportable Java object not serialised or deserialised with compressed binary

Posted by GitBox <gi...@apache.org>.
mjunsilo commented on pull request #88:
URL: https://github.com/apache/uima-uimaj/pull/88#issuecomment-737248492


   I had a look where to add the unit tests, but I am somewhat unsure about this. There doesn't seem to be any existing unit tests for this, except maybe for FSHashSet and related classes that implement UimaSerializableFSs, but they don't test the serialization integration. I therefore intent to transfer my tests from the example project to a new UimaSerializableFSsTest class in org.apache.uima.cas.impl, and replace any use of uimaFIT with CAS api calls, similar to how it is done in UimaV2CasCompatibilityTest. My next problem is where to put the type description XML, but there seems to be some already in ExampleCas, so I could put my type system description there. The types themselves don't seem to be generated at compile time, but they seem to be generated before and then committed as part of the project. I am not sure about the location though, but Sentence and Token are placed in org.apache.uima.cas.test, so I could put them there as well. Can you provide me with some guidance on 
 this, since I am a bit in the dark here? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [uima-uimaj] reckart merged pull request #88: [UIMA-6295] CAS transportable Java object not serialised or deserialised with compressed binary

Posted by GitBox <gi...@apache.org>.
reckart merged pull request #88:
URL: https://github.com/apache/uima-uimaj/pull/88


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [uima-uimaj] reckart commented on pull request #88: [UIMA-6295] CAS transportable Java object not serialised or deserialised with compressed binary

Posted by GitBox <gi...@apache.org>.
reckart commented on pull request #88:
URL: https://github.com/apache/uima-uimaj/pull/88#issuecomment-737274576


   Thanks! I'll look at the test and fix it up. I'm also trying to wrap my head around whether the call you commented out is redundant or if there might be a case where it could be required.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [uima-uimaj] reckart commented on pull request #88: [UIMA-6295] CAS transportable Java object not serialised or deserialised with compressed binary

Posted by GitBox <gi...@apache.org>.
reckart commented on pull request #88:
URL: https://github.com/apache/uima-uimaj/pull/88#issuecomment-737349097


   Ah, right. The line commented out does *only* affect the custom serialization. So nothing should change for any code/CASes that do not use it. So there should be no risk of breaking any existing CASes. Nevermind I guess 👍 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [uima-uimaj] mjunsilo commented on pull request #88: [UIMA-6295] CAS transportable Java object not serialised or deserialised with compressed binary

Posted by GitBox <gi...@apache.org>.
mjunsilo commented on pull request #88:
URL: https://github.com/apache/uima-uimaj/pull/88#issuecomment-742626601


   I added the identifiers from the sortedFSs list inside the csds to the tracking set, although I wasn't sure about the filtering going on there, and whether that could lead to some identifiers being missed, but this makes the unit tests pass.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [uima-uimaj] mjunsilo commented on pull request #88: [UIMA-6295] CAS transportable Java object not serialised or deserialised with compressed binary

Posted by GitBox <gi...@apache.org>.
mjunsilo commented on pull request #88:
URL: https://github.com/apache/uima-uimaj/pull/88#issuecomment-737884529


   Turns out the test still fails when using some of the binary form 4 serialization formats, but the change I made only affects form 6, and it looks like a similar thing happening for form 4 and may need to be changed for this serializer as well.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [uima-uimaj] mjunsilo edited a comment on pull request #88: [UIMA-6295] CAS transportable Java object not serialised or deserialised with compressed binary

Posted by GitBox <gi...@apache.org>.
mjunsilo edited a comment on pull request #88:
URL: https://github.com/apache/uima-uimaj/pull/88#issuecomment-738723092


   I am sorry I haven't yet found the time to commit the tests to the PR, but I refactored the code as prep to move it to the new locations, so the only thing missing is to copy it to the uimaj repo. I'll try to do that as soon as possible, but I have been pretty hung up with other things at work.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [uima-uimaj] mjunsilo commented on pull request #88: [UIMA-6295] CAS transportable Java object not serialised or deserialised with compressed binary

Posted by GitBox <gi...@apache.org>.
mjunsilo commented on pull request #88:
URL: https://github.com/apache/uima-uimaj/pull/88#issuecomment-738723092


   I am sorry I haven't yet found the time to commit the tests to the PR, but I refactored the code as prep to move it to the new locations, so the only thing missing is to copy it to the uimaj repo. I'll try to do that as soon as possible, but I have been pretty hung up on other things at work.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [uima-uimaj] reckart commented on pull request #88: [UIMA-6295] CAS transportable Java object not serialised or deserialised with compressed binary

Posted by GitBox <gi...@apache.org>.
reckart commented on pull request #88:
URL: https://github.com/apache/uima-uimaj/pull/88#issuecomment-738618020


   @mjunsilo instead of commenting out the branch, I set up a solution that keeps track of which FSes `_save_to_cas_data()` has already been called on. That works for me with the test you provided via Jira.
   
   Does that tracking approach look ok to you as well?
   
   As you mentioned that form4 suffers from the same issue, would you like to look into the form4 serialization?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [uima-uimaj] mjunsilo commented on pull request #88: [UIMA-6295] CAS transportable Java object not serialised or deserialised with compressed binary

Posted by GitBox <gi...@apache.org>.
mjunsilo commented on pull request #88:
URL: https://github.com/apache/uima-uimaj/pull/88#issuecomment-741925257


   OK, I didn't notice that before, but the form 6 serializer has direct access to the AllFSs instance, which the form 4 serializer does not. It exists only temporary to produce the sortedFSs inside the csds (CommonSerDesSequential), which is some filtered subset of all FSs, so some identifiers could be missing if used as source instead. We could of course keep the reference inside the csds in order to get access, but I am not sure it's the right solution for this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [uima-uimaj] reckart commented on pull request #88: [UIMA-6295] CAS transportable Java object not serialised or deserialised with compressed binary

Posted by GitBox <gi...@apache.org>.
reckart commented on pull request #88:
URL: https://github.com/apache/uima-uimaj/pull/88#issuecomment-741927346


   I'd need to look at it more closely, but it seems that the `org.apache.uima.cas.impl.CommonSerDesSequential.setup(MarkerImpl, int)` method which internally calls the `AllFSs` returns the list of FSes that `AllFSs` has found. So e.g. `org.apache.uima.cas.impl.BinaryCasSerDes4.Serializer.serialize()` could use this return value and add the FSes to its tracking set.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [uima-uimaj] reckart commented on pull request #88: [UIMA-6295] CAS transportable Java object not serialised or deserialised with compressed binary

Posted by GitBox <gi...@apache.org>.
reckart commented on pull request #88:
URL: https://github.com/apache/uima-uimaj/pull/88#issuecomment-741879699


   Well, that happens in form6 as well. That is why I put these lines here: 
   
   https://github.com/apache/uima-uimaj/pull/88/files#diff-18b9502ac2a64a7a54abdf1d21a9c2b5d88e8336f2ac9c8b18c611794906591eR2850-R2852
   ```
           // AllFSs internally already causes _save_to_cas_data() to be called, so we have to add all
            // the FSes that are returned here to the uimaSerializableSavedToCas tracking set
            allFSs.getAllFSs().forEach(fs -> uimaSerializableSavedToCas.add(fs._id));
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [uima-uimaj] mjunsilo commented on pull request #88: [UIMA-6295] CAS transportable Java object not serialised or deserialised with compressed binary

Posted by GitBox <gi...@apache.org>.
mjunsilo commented on pull request #88:
URL: https://github.com/apache/uima-uimaj/pull/88#issuecomment-735780241


   I still need to implement unit tests that covers the case, which I will do ASAP.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [uima-uimaj] mjunsilo commented on pull request #88: [UIMA-6295] CAS transportable Java object not serialised or deserialised with compressed binary

Posted by GitBox <gi...@apache.org>.
mjunsilo commented on pull request #88:
URL: https://github.com/apache/uima-uimaj/pull/88#issuecomment-737284921


   I didn't think this through. The change is only in BinaryCasSerDes6, so it can only affect the different compressed binary variants of this. However, having the tests for all formats is still a good idea.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org