You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@annotator.apache.org by GitBox <gi...@apache.org> on 2021/07/18 11:08:22 UTC

[GitHub] [incubator-annotator] DellCliff opened a new issue #115: Turn AsyncGenerators into Generators

DellCliff opened a new issue #115:
URL: https://github.com/apache/incubator-annotator/issues/115


   What is the reason behind matchers producing AsyncGenerators. They don't do IO or wait on callbacks as far as I can see. Having an async function which doesn't need to be one has serious disadvantages, like async-infection up the call-chain, race conditions and so on.
   
   A simple modification of `packages/selector/src/text/match-text-position.ts` removes the unnecessary async.
   
   ```
   export function textPositionSelectorMatcher(
     selector: TextPositionSelector,
   ): <TChunk extends Chunk<any>>(
     scope: Chunker<TChunk>,
   ) => AsyncGenerator<ChunkRange<TChunk>, void, void> {
     const { start, end } = selector;
   
     return async function* matchAll<TChunk extends Chunk<string>>(
       textChunks: Chunker<TChunk>,
     ) {
       const codeUnitSeeker = new TextSeeker(textChunks);
       const codePointSeeker = new CodePointSeeker(codeUnitSeeker);
   
       codePointSeeker.seekTo(start);
       const startChunk = codeUnitSeeker.currentChunk;
       const startIndex = codeUnitSeeker.offsetInChunk;
       codePointSeeker.seekTo(end);
       const endChunk = codeUnitSeeker.currentChunk;
       const endIndex = codeUnitSeeker.offsetInChunk;
   
       yield { startChunk, startIndex, endChunk, endIndex };
     };
   }
   ```
   
   ```
   export function textPositionSelectorMatcher(
     selector: TextPositionSelector,
   ): <TChunk extends Chunk<any>>(
     scope: Chunker<TChunk>,
   ) => Generator<ChunkRange<TChunk>, void, void> {
     const { start, end } = selector;
   
     return function* matchAll<TChunk extends Chunk<string>>(
       textChunks: Chunker<TChunk>,
     ) {
       const codeUnitSeeker = new TextSeeker(textChunks);
       const codePointSeeker = new CodePointSeeker(codeUnitSeeker);
   
       codePointSeeker.seekTo(start);
       const startChunk = codeUnitSeeker.currentChunk;
       const startIndex = codeUnitSeeker.offsetInChunk;
       codePointSeeker.seekTo(end);
       const endChunk = codeUnitSeeker.currentChunk;
       const endIndex = codeUnitSeeker.offsetInChunk;
   
       yield { startChunk, startIndex, endChunk, endIndex };
     };
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@annotator.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-annotator] Treora commented on issue #115: Turn AsyncGenerators into Generators

Posted by GitBox <gi...@apache.org>.
Treora commented on issue #115:
URL: https://github.com/apache/incubator-annotator/issues/115#issuecomment-882519923


   You are completely correct that there is no need for the current functions to be async. The reason they are async is to make it easier to do async stuff in the future, without needing to change the API. For example, fuzzy text search may be computationally expensive, and could be offloaded to a worker thread.
   
   Note that e.g. a `TextPositionSelector` can never produce multiple results, so besides not needing to be async its matcher would not even have be a generator. However, the idea is to have a coherency in the function signatures, and making all matchers return async generators seemed the most flexible option, allowing for easy composition of matchers; that is, to have a single function able to handle various types of Selectors, dispatching them to the appropriate functions, as we do [in the demo](https://github.com/apache/incubator-annotator/blob/a352fff9b9abda66d4627d4644ea5a2e2218d7c4/web/index.js#L101-L113).
   
   We are aware that this choice of generality does impose constraints on the users of the functions, who need to await each result, and whose functions thus have to become async themselves; ‘async infection’ as you call it. We have been thinking about a practical way to provide a synchronous API in addition to an asynchronous API. See the discussion in #81; suggestions are welcome!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@annotator.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-annotator] DellCliff edited a comment on issue #115: Turn AsyncGenerators into Generators

Posted by GitBox <gi...@apache.org>.
DellCliff edited a comment on issue #115:
URL: https://github.com/apache/incubator-annotator/issues/115#issuecomment-882816203


   This kinda smells like speculative generality. IMO I would make those matchers not adhere to some common interface and let them have their own signature. A user can then pick and choose. Maybe supply helper functions which take a matcher and produce one with the desired signature, or offer both sync and async out of the box or fuzzy/non-fuzzy. One can always make a sync function into async one, but not the other way around.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@annotator.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-annotator] DellCliff commented on issue #115: Turn AsyncGenerators into Generators

Posted by GitBox <gi...@apache.org>.
DellCliff commented on issue #115:
URL: https://github.com/apache/incubator-annotator/issues/115#issuecomment-882816203






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@annotator.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-annotator] Treora commented on issue #115: Turn AsyncGenerators into Generators

Posted by GitBox <gi...@apache.org>.
Treora commented on issue #115:
URL: https://github.com/apache/incubator-annotator/issues/115#issuecomment-882519923


   You are completely correct that there is no need for the current functions to be async. The reason they are async is to make it easier to do async stuff in the future, without needing to change the API. For example, fuzzy text search may be computationally expensive, and could be offloaded to a worker thread.
   
   Note that e.g. a `TextPositionSelector` can never produce multiple results, so besides not needing to be async its matcher would not even have be a generator. However, the idea is to have a coherency in the function signatures, and making all matchers return async generators seemed the most flexible option, allowing for easy composition of matchers; that is, to have a single function able to handle various types of Selectors, dispatching them to the appropriate functions, as we do [in the demo](https://github.com/apache/incubator-annotator/blob/a352fff9b9abda66d4627d4644ea5a2e2218d7c4/web/index.js#L101-L113).
   
   We are aware that this choice of generality does impose constraints on the users of the functions, who need to await each result, and whose functions thus have to become async themselves; ‘async infection’ as you call it. We have been thinking about a practical way to provide a synchronous API in addition to an asynchronous API. See the discussion in #81; suggestions are welcome!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@annotator.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-annotator] DellCliff commented on issue #115: Turn AsyncGenerators into Generators

Posted by GitBox <gi...@apache.org>.
DellCliff commented on issue #115:
URL: https://github.com/apache/incubator-annotator/issues/115#issuecomment-882816203






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@annotator.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-annotator] tilgovi edited a comment on issue #115: Turn AsyncGenerators into Generators

Posted by GitBox <gi...@apache.org>.
tilgovi edited a comment on issue #115:
URL: https://github.com/apache/incubator-annotator/issues/115#issuecomment-883082208


   > One can always make a sync function into async one, but not the other way around.
   
   Yes, that's the reason for making everything async now. If we compose higher level pipelines with sync interfaces then we won't be ample to incorporate async building blocks, but if all the composition is asynchronous then it admits synchronous or asynchronous building blocks.
   
   I hear the concern about speculative generality, but it's not baseless speculative. I had use cases like streaming corpuses in mind as we designed this.
   
   Initial designs had an interface that allowed selector implementations to implement synchronous or asynchronous functions (or both), but I dropped that to make the surface area smaller.
   
   In any case, feedback is appreciated and I'm open to changing this but I don't want to jump to conclusions. If you find you're really working with the APIs a lot and async is really proving cumbersome, I'd love to see examples of the kinds of code that feels hard to modify to be asynchronous.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@annotator.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-annotator] Treora commented on issue #115: Turn AsyncGenerators into Generators

Posted by GitBox <gi...@apache.org>.
Treora commented on issue #115:
URL: https://github.com/apache/incubator-annotator/issues/115#issuecomment-882519923


   You are completely correct that there is no need for the current functions to be async. The reason they are async is to make it easier to do async stuff in the future, without needing to change the API. For example, fuzzy text search may be computationally expensive, and could be offloaded to a worker thread.
   
   Note that e.g. a `TextPositionSelector` can never produce multiple results, so besides not needing to be async its matcher would not even have be a generator. However, the idea is to have a coherency in the function signatures, and making all matchers return async generators seemed the most flexible option, allowing for easy composition of matchers; that is, to have a single function able to handle various types of Selectors, dispatching them to the appropriate functions, as we do [in the demo](https://github.com/apache/incubator-annotator/blob/a352fff9b9abda66d4627d4644ea5a2e2218d7c4/web/index.js#L101-L113).
   
   We are aware that this choice of generality does impose constraints on the users of the functions, who need to await each result, and whose functions thus have to become async themselves; ‘async infection’ as you call it. We have been thinking about a practical way to provide a synchronous API in addition to an asynchronous API. See the discussion in #81; suggestions are welcome!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@annotator.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-annotator] tilgovi commented on issue #115: Turn AsyncGenerators into Generators

Posted by GitBox <gi...@apache.org>.
tilgovi commented on issue #115:
URL: https://github.com/apache/incubator-annotator/issues/115#issuecomment-883082208


   > One can always make a sync function into async one, but not the other way around.
   
   Yes, that's the reason for making everything async now. If we compose higher level pipelines with sync interfaces then we won't be ample to incorporate async building blocks, but if all the composition is asynchronous then it admits synchronous or asynchronous building blocks.
   
   I hear the concern about speculative generality, but it's not entirely speculative. I had use cases like streaming corpuses in mind as we designed this.
   
   Initial designs had an interface that allowed selector implementations to implement synchronous or asynchronous functions (or both), but I dropped that to make the surface area smaller.
   
   In any case, feedback is appreciated and I'm open to changing this but I don't want to jump to conclusions. If you find you're really working with the APIs a lot and async is really proving cumbersome, I'd love to see examples of the kinds of code that feels hard to modify to be asynchronous.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@annotator.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-annotator] tilgovi commented on issue #115: Turn AsyncGenerators into Generators

Posted by GitBox <gi...@apache.org>.
tilgovi commented on issue #115:
URL: https://github.com/apache/incubator-annotator/issues/115#issuecomment-883082208


   > One can always make a sync function into async one, but not the other way around.
   
   Yes, that's the reason for making everything async now. If we compose higher level pipelines with sync interfaces then we won't be ample to incorporate async building blocks, but if all the composition is asynchronous then it admits synchronous or asynchronous building blocks.
   
   I hear the concern about speculative generality, but it's not entirely speculative. I had use cases like streaming corpuses in mind as we designed this.
   
   Initial designs had an interface that allowed selector implementations to implement synchronous or asynchronous functions (or both), but I dropped that to make the surface area smaller.
   
   In any case, feedback is appreciated and I'm open to changing this but I don't want to jump to conclusions. If you find you're really working with the APIs a lot and async is really proving cumbersome, I'd love to see examples of the kinds of code that feels hard to modify to be asynchronous.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@annotator.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-annotator] tilgovi commented on issue #115: Turn AsyncGenerators into Generators

Posted by GitBox <gi...@apache.org>.
tilgovi commented on issue #115:
URL: https://github.com/apache/incubator-annotator/issues/115#issuecomment-883082208


   > One can always make a sync function into async one, but not the other way around.
   
   Yes, that's the reason for making everything async now. If we compose higher level pipelines with sync interfaces then we won't be ample to incorporate async building blocks, but if all the composition is asynchronous then it admits synchronous or asynchronous building blocks.
   
   I hear the concern about speculative generality, but it's not entirely speculative. I had use cases like streaming corpuses in mind as we designed this.
   
   Initial designs had an interface that allowed selector implementations to implement synchronous or asynchronous functions (or both), but I dropped that to make the surface area smaller.
   
   In any case, feedback is appreciated and I'm open to changing this but I don't want to jump to conclusions. If you find you're really working with the APIs a lot and async is really proving cumbersome, I'd love to see examples of the kinds of code that feels hard to modify to be asynchronous.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@annotator.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-annotator] DellCliff edited a comment on issue #115: Turn AsyncGenerators into Generators

Posted by GitBox <gi...@apache.org>.
DellCliff edited a comment on issue #115:
URL: https://github.com/apache/incubator-annotator/issues/115#issuecomment-882816203


   This kinda smells like speculative generality. IMO I would make those matchers not adhere to some common interface and let them have their own signature. A user can then pick and choose. Maybe supply helper functions which take a matcher and produce one with the desired signature, or offer both sync and async out of the box or fuzzy/non-fuzzy. One can always make a sync function into async one, but not the other way around.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@annotator.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-annotator] DellCliff commented on issue #115: Turn AsyncGenerators into Generators

Posted by GitBox <gi...@apache.org>.
DellCliff commented on issue #115:
URL: https://github.com/apache/incubator-annotator/issues/115#issuecomment-882816203


   This kinda smells like speculative generality. IMO I would make those matchers not adhere to some common interface and let them have their own signature. A user can then pick and choose. Maybe supply helper functions which take a matcher and produce one with the desired signature, or offer both sync and async out of the box. One can always make a sync function into async one, but not the other way around.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@annotator.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-annotator] tilgovi edited a comment on issue #115: Turn AsyncGenerators into Generators

Posted by GitBox <gi...@apache.org>.
tilgovi edited a comment on issue #115:
URL: https://github.com/apache/incubator-annotator/issues/115#issuecomment-883082208


   > One can always make a sync function into async one, but not the other way around.
   
   Yes, that's the reason for making everything async now. If we compose higher level pipelines with sync interfaces then we won't be ample to incorporate async building blocks, but if all the composition is asynchronous then it admits synchronous or asynchronous building blocks.
   
   I hear the concern about speculative generality, but it's not baseless speculative. I had use cases like streaming corpuses in mind as we designed this.
   
   Initial designs had an interface that allowed selector implementations to implement synchronous or asynchronous functions (or both), but I dropped that to make the surface area smaller.
   
   In any case, feedback is appreciated and I'm open to changing this but I don't want to jump to conclusions. If you find you're really working with the APIs a lot and async is really proving cumbersome, I'd love to see examples of the kinds of code that feels hard to modify to be asynchronous.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@annotator.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-annotator] tilgovi edited a comment on issue #115: Turn AsyncGenerators into Generators

Posted by GitBox <gi...@apache.org>.
tilgovi edited a comment on issue #115:
URL: https://github.com/apache/incubator-annotator/issues/115#issuecomment-883082208


   > One can always make a sync function into async one, but not the other way around.
   
   Yes, that's the reason for making everything async now. If we compose higher level pipelines with sync interfaces then we won't be ample to incorporate async building blocks, but if all the composition is asynchronous then it admits synchronous or asynchronous building blocks.
   
   I hear the concern about speculative generality, but it's not baseless speculative. I had use cases like streaming corpuses in mind as we designed this.
   
   Initial designs had an interface that allowed selector implementations to implement synchronous or asynchronous functions (or both), but I dropped that to make the surface area smaller.
   
   In any case, feedback is appreciated and I'm open to changing this but I don't want to jump to conclusions. If you find you're really working with the APIs a lot and async is really proving cumbersome, I'd love to see examples of the kinds of code that feels hard to modify to be asynchronous.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@annotator.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-annotator] tilgovi commented on issue #115: Turn AsyncGenerators into Generators

Posted by GitBox <gi...@apache.org>.
tilgovi commented on issue #115:
URL: https://github.com/apache/incubator-annotator/issues/115#issuecomment-901270903


   I've been thinking more about this and I think we should try to tackle it for the next release.
   
   One option would be to just make sure that our API allows context to be passed in that makes it possible to process things (synchronously) in chunks, so that higher level, async APIs could limit the amount of work they do before returning to the event loop.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@annotator.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-annotator] DellCliff edited a comment on issue #115: Turn AsyncGenerators into Generators

Posted by GitBox <gi...@apache.org>.
DellCliff edited a comment on issue #115:
URL: https://github.com/apache/incubator-annotator/issues/115#issuecomment-882816203


   This kinda smells like speculative generality. IMO I would make those matchers not adhere to some common interface and let them have their own signature. A user can then pick and choose. Maybe supply helper functions which take a matcher and produce one with the desired signature, or offer both sync and async out of the box or fuzzy/non-fuzzy. One can always make a sync function into async one, but not the other way around.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@annotator.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org