You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by "Finan, Sean" <Se...@childrens.harvard.edu.INVALID> on 2022/06/03 13:31:20 UTC

Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

Hi Tim,

>we ran into issues in previous attempts at migration with the large file sizes in our repo

Indeed we did, and over the years I have had thoughts on that.  

Those large files are large ml models, which are (mostly) static, replaceable/interchangeable, not always necessary, and in separate resource (-res) modules separated from code modules.

When I was a ctakes newby really disliked the separation of code from resources by entirely separate -res modules.  Since then, through working on projects that use ctakes code but not (huge) resources as dependencies, I have realized the wisdom of the modular separation.  In fact, I put a -huge- model in its own -res module so that I could <exclude> it from a ctakes-dependent project, saving compile (download) time and disk space.  Like you, I don't like to "download the internet" with maven   ;^)

Right now we have the ner dictionaries in sourceforge, not the apache repos.  While this is done for legal reasons it has worked pretty well.

I think that we could maintain an apache SVN repo of -res modules containing only huge model files.   I am guessing that we would have to make it a "side/sub project" to maintain a separate repo (jenkins build, etc.).   

Anyway, it would give us the freedom to use a github repo for code (and non-model resources) without users needing to go through the github large-file workflow, which I see as a barrier to entry.

Thoughts?

________________________________________
From: Miller, Timothy <Ti...@childrens.harvard.edu.INVALID>
Sent: Thursday, June 2, 2022 6:21 PM
To: dev@ctakes.apache.org
Subject: Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

* External Email - Caution *


My recollection was that we ran into issues in previous attempts at migration with the large file sizes in our repo.
Tim


On Thu, 2022-06-02 at 20:55 +0000, Finan, Sean wrote:

* External Email - Caution *



Thank you Gandhi and Richard.


Unless somebody else beats me to it I will perform some research and see what approaches can be used and which might be best.  In the end the cTAKES Project Management Committee will need to vote for any action as sweeping as moving to github.


Sean

________________________________________

From: gandhi rajan <

<ma...@gmail.com>

gandhirajan.n@gmail.com

>

Sent: Thursday, June 2, 2022 9:02 AM

To:

<ma...@ctakes.apache.org>

dev@ctakes.apache.org


Subject: Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]


* External Email - Caution *



Hi Sean,


If we are sure that the SVN has all the latest changes and active

development is primarily on SVN, then why don't we request a fresh git

repository and push all the changes over there.


More info on

<https://urldefense.com/v3/__https://infra.apache.org/svn-to-git-migration.html__;!!NZvER7FxgEiBAiR_!rXFMCtlZM4NpDPkgzeq-X2pj1rNwzQNTpZkMZXDoYiZKdJp0n4tDY6q9IcsGRPGrA6KhvmouV_1y_txDVok-tGy3dVLaqefQlQ$>

https://urldefense.com/v3/__https://infra.apache.org/svn-to-git-migration.html__;!!NZvER7FxgEiBAiR_!rXFMCtlZM4NpDPkgzeq-X2pj1rNwzQNTpZkMZXDoYiZKdJp0n4tDY6q9IcsGRPGrA6KhvmouV_1y_txDVok-tGy3dVLaqefQlQ$



On Thu, Jun 2, 2022 at 5:52 PM Finan, Sean

<

<ma...@childrens.harvard.edu.invalid>

Sean.Finan@childrens.harvard.edu.invalid

> wrote:


Hi Richard, you bring up a valid concern.


cTAKES Developers:


The Apache Foundation has had an initiative to "move" all projects to

GitHub for some time now.


I don't know much about how this is done.  If anybody out there has

knowledge or experience that they can pass on, please share.


Thanks,

Sean

________________________________________

From: Richard Eckart de Castilho <

<ma...@apache.org>

rec@apache.org

>

Sent: Thursday, June 2, 2022 3:39 AM

To:

<ma...@ctakes.apache.org>

dev@ctakes.apache.org


Subject: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]


* External Email - Caution *



Hi,


it appears that the GitHub mirror of Apache cTAKES may be stuck.


When I check the svn log of

<https://urldefense.com/v3/__https://svn.apache.org/repos/asf/ctakes/trunk/__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9XZi-zSEw$>

https://urldefense.com/v3/__https://svn.apache.org/repos/asf/ctakes/trunk/__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9XZi-zSEw$


, I can

see activity as recent as May 2022.


However, on GitHub, I can only see stale branches:



<https://urldefense.com/v3/__https://github.com/apache/ctakes/branches__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9Uu2s-59w$>

https://urldefense.com/v3/__https://github.com/apache/ctakes/branches__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9Uu2s-59w$



Wouldn't it be good if the GitHub mirror would be kept up-to-date?


Best,


-- Richard




--

Regards,

Gandhi


"The best way to find urself is to lose urself in the service of others !!!"

Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu.INVALID>.
Hi Richard,

Thank you for this information, any and all help that you can provide is greatly appreciated.

>The use of Git and GitHub is well supported by the INFRA team.
-- True. I actually contacted them a year or two ago and they already had mechanisms in place to easily migrate code and hook up CI.  That doesn't really worry me.

>Jenkins also supports GitHub very well [4]. For example, in UIMA, we just drop a `Jenkinsfile` 
> I'm happy to help you setting that up for cTAKES as well.
-- Your assistance would be appreciated.  A bit ago when Infra switched Jenkins platforms we lost our (there kept) configurations and I had to create new setups on their current platform.  The wizard gui is helpful ... to a point.   Anyway, an editable build configuration stored in our code repo would definitely be an improvement.

>I fear that people may not have svn installed anymore
-- Also very true, and a great reason to get our code into GitHub.

>So requiring svn to download models and drop them into m2 might be an inconvenience.
-- I agree wholeheartedly, and my writing may have been imprecise but that was definitely not my intention.

>If the models live in a Maven Repository and can be dragged in as a normal dependency, that would seem most convenient.
--  Yup.  A new model creator could deal with svn and the svn model repo, but the 99.9999% of developers who don't contribute models to ctakes wouldn't need to worry about this.

I hope that we don't let this slip.   It will require some effort with setup and test, and I fear that it may require reorganization of the code and resources such as I have proposed.  It definitely should not be a one-person-job ...  I also think that we need to have a ctakes 5.0 release before any of this is undertaken, which requires the usual planning, effort and cooperation.

Sean


________________________________________
From: Richard Eckart de Castilho <re...@apache.org>
Sent: Tuesday, June 28, 2022 6:54 AM
To: dev@ctakes.apache.org
Subject: Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

* External Email - Caution *


Hi all,

> On 6. Jun 2022, at 16:09, Finan, Sean <Se...@childrens.harvard.edu.INVALID> wrote:
>
> Hi Kean,
>
> Thank you for the suggestion and the link. I am really glad that people are interested in this guithub topic and taking it seriously. It would be great if we could make it happen.
>
> While definitely a possibility, the git LFS paradigm is something that I would like to avoid.
>
> Like keeping our models on SVN, it would also require separating models from code into two different repos, e.g. github and bitbucket. As opposed to bitbucket, the apache svn repos are long established, familiar to and supported by the apache infrastructure team. The same goes for the apache foundation use of github. I like being able to lean on the apache infra team for help.

So GitHub seems to have support for LFS [1]. What I do not know is if the ASF's GitHub plan allows us to use this and if so if there is a volume limit. Would have to ask INFRA about that.

The use of Git and GitHub is well supported by the INFRA team. For example, there is self-service for creating and managing repos. [2]

There is also the `.asf.yaml` mechanism for configuring GitHub repos and hooking them up with the ASF infrastructure including mailing lists, website publishing, etc. etc. [3]

> The apache Jenkins servers are linked to the svn repos, making continuous integration easy - on the rare occasion when somebody does change something in a model repo. While I expect anybody savvy enough to work on models to also have the knowhow and wherewithal to work with a separate svn repo, I don't want them to need to get out to jenkins and manually kick off snapshot builds.

Jenkins also supports GitHub very well [4]. For example, in UIMA, we just drop a `Jenkinsfile` [5,6] configuration file into each repo and Jenkins picks them up even gives us support pull requests [7].
I'm happy to help you setting that up for cTAKES as well.

> Probably most important is the requirement of the client user to have the LFS command line client. I think that there are enough hoops stuck in front of getting ctakes installed/checked out/cloned/etc. and it seems to me that one of the biggest reasons to use github is to make things easier for absolute newbies to just pull down code and experiment.

It is an additional hoop to jump through indeed, but it is a one-time action to install LFS. Chances are that people may even already have it set up because they use it in other repos.

> Keeping the models on a separate svn repo would mean that they aren't checked out as code, but would be put in the .m2 maven area when a user runs maven compile. While the total footprint of full ctakes would still be the same size, it would essentially make the code directory smaller and initial downloads/checkouts would be faster. Plus, if done properly maybe it could "clean up" all of those nearly identically named modules in my intellij project window and I'd stop clicking on the wrong one when I've had too much coffee.

Nowadays, I fear that people may not have svn installed anymore ;) So requiring svn to download models and drop them into m2 might be an inconvenience. If the models live in a Maven Repository and can be dragged in as a normal dependency, that would seem most convenient.

Cheers,

-- Richard

[1] https://urldefense.com/v3/__https://docs.github.com/en/repositories/working-with-files/managing-large-files/configuring-git-large-file-storage__;!!NZvER7FxgEiBAiR_!q5Wldlok6qAOP3hAUb8YtQYlp7yvfvV8URlWAGJg-qzG-DD_Dn9D3Udb6Bf5EHh5KQ8Nuf198TvIydi4nZe7vA$
[2] https://urldefense.com/v3/__https://gitbox.apache.org__;!!NZvER7FxgEiBAiR_!q5Wldlok6qAOP3hAUb8YtQYlp7yvfvV8URlWAGJg-qzG-DD_Dn9D3Udb6Bf5EHh5KQ8Nuf198TvIydj-slTQ0A$
[3] https://urldefense.com/v3/__https://s.apache.org/asfyaml__;!!NZvER7FxgEiBAiR_!q5Wldlok6qAOP3hAUb8YtQYlp7yvfvV8URlWAGJg-qzG-DD_Dn9D3Udb6Bf5EHh5KQ8Nuf198TvIydigHOa6NQ$
[4] https://urldefense.com/v3/__https://builds.apache.org/job/UIMA/__;!!NZvER7FxgEiBAiR_!q5Wldlok6qAOP3hAUb8YtQYlp7yvfvV8URlWAGJg-qzG-DD_Dn9D3Udb6Bf5EHh5KQ8Nuf198TvIydgs7qC52g$
[5] https://urldefense.com/v3/__https://github.com/apache/uima-uimaj/blob/main/Jenkinsfile__;!!NZvER7FxgEiBAiR_!q5Wldlok6qAOP3hAUb8YtQYlp7yvfvV8URlWAGJg-qzG-DD_Dn9D3Udb6Bf5EHh5KQ8Nuf198TvIydgNQf2DtA$
[6] https://urldefense.com/v3/__https://github.com/apache/uima-build-jenkins-shared-library__;!!NZvER7FxgEiBAiR_!q5Wldlok6qAOP3hAUb8YtQYlp7yvfvV8URlWAGJg-qzG-DD_Dn9D3Udb6Bf5EHh5KQ8Nuf198TvIydhWP3z30g$
[7] https://urldefense.com/v3/__https://builds.apache.org/job/UIMA/job/uima-uimaj/view/change-requests/__;!!NZvER7FxgEiBAiR_!q5Wldlok6qAOP3hAUb8YtQYlp7yvfvV8URlWAGJg-qzG-DD_Dn9D3Udb6Bf5EHh5KQ8Nuf198TvIydhXubd7IA$

Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

Posted by Richard Eckart de Castilho <re...@apache.org>.
Hi all,

> On 6. Jun 2022, at 16:09, Finan, Sean <Se...@childrens.harvard.edu.INVALID> wrote:
> 
> Hi Kean,
> 
> Thank you for the suggestion and the link. I am really glad that people are interested in this guithub topic and taking it seriously. It would be great if we could make it happen.
> 
> While definitely a possibility, the git LFS paradigm is something that I would like to avoid. 
> 
> Like keeping our models on SVN, it would also require separating models from code into two different repos, e.g. github and bitbucket. As opposed to bitbucket, the apache svn repos are long established, familiar to and supported by the apache infrastructure team. The same goes for the apache foundation use of github. I like being able to lean on the apache infra team for help.

So GitHub seems to have support for LFS [1]. What I do not know is if the ASF's GitHub plan allows us to use this and if so if there is a volume limit. Would have to ask INFRA about that.

The use of Git and GitHub is well supported by the INFRA team. For example, there is self-service for creating and managing repos. [2]

There is also the `.asf.yaml` mechanism for configuring GitHub repos and hooking them up with the ASF infrastructure including mailing lists, website publishing, etc. etc. [3]

> The apache Jenkins servers are linked to the svn repos, making continuous integration easy - on the rare occasion when somebody does change something in a model repo. While I expect anybody savvy enough to work on models to also have the knowhow and wherewithal to work with a separate svn repo, I don't want them to need to get out to jenkins and manually kick off snapshot builds.

Jenkins also supports GitHub very well [4]. For example, in UIMA, we just drop a `Jenkinsfile` [5,6] configuration file into each repo and Jenkins picks them up even gives us support pull requests [7].
I'm happy to help you setting that up for cTAKES as well.

> Probably most important is the requirement of the client user to have the LFS command line client. I think that there are enough hoops stuck in front of getting ctakes installed/checked out/cloned/etc. and it seems to me that one of the biggest reasons to use github is to make things easier for absolute newbies to just pull down code and experiment.

It is an additional hoop to jump through indeed, but it is a one-time action to install LFS. Chances are that people may even already have it set up because they use it in other repos.

> Keeping the models on a separate svn repo would mean that they aren't checked out as code, but would be put in the .m2 maven area when a user runs maven compile. While the total footprint of full ctakes would still be the same size, it would essentially make the code directory smaller and initial downloads/checkouts would be faster. Plus, if done properly maybe it could "clean up" all of those nearly identically named modules in my intellij project window and I'd stop clicking on the wrong one when I've had too much coffee.

Nowadays, I fear that people may not have svn installed anymore ;) So requiring svn to download models and drop them into m2 might be an inconvenience. If the models live in a Maven Repository and can be dragged in as a normal dependency, that would seem most convenient.

Cheers,

-- Richard

[1] https://docs.github.com/en/repositories/working-with-files/managing-large-files/configuring-git-large-file-storage
[2] https://gitbox.apache.org
[3] https://s.apache.org/asfyaml
[4] https://builds.apache.org/job/UIMA/
[5] https://github.com/apache/uima-uimaj/blob/main/Jenkinsfile
[6] https://github.com/apache/uima-build-jenkins-shared-library 
[7] https://builds.apache.org/job/UIMA/job/uima-uimaj/view/change-requests/

Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu.INVALID>.
Hi Kean,

Thank you for the suggestion and the link.  I am really glad that people are interested in this guithub topic and taking it seriously.  It would be great if we could make it happen.

While definitely a possibility, the git LFS paradigm is something that I would like to avoid.  

Like keeping our models on SVN, it would also require separating models from code into two different repos, e.g. github and bitbucket.  As opposed to bitbucket, the apache svn repos are long established, familiar to and supported by the apache infrastructure team.  The same goes for the apache foundation use of github.  I like being able to lean on the apache infra team for help.

The apache Jenkins servers are linked to the svn repos, making continuous integration easy - on the rare occasion when somebody does change something in a model repo.  While I expect anybody savvy enough to work on models to also have the knowhow and wherewithal to work with a separate svn repo, I don't want them to need to get out to jenkins and manually kick off snapshot builds.

Probably most important is the requirement of the client user to have the LFS command line client.  I think that there are enough hoops stuck in front of getting ctakes installed/checked out/cloned/etc. and it seems to me that one of the biggest reasons to use github is to make things easier for absolute newbies to just pull down code and experiment.

Keeping the models on a separate svn repo would mean that they aren't checked out as code, but would be put in the .m2 maven area when a user runs maven compile.  While the total footprint of full ctakes would still be the same size, it would essentially make the code directory smaller and initial downloads/checkouts would be faster.  Plus, if done properly maybe it could "clean up" all of those nearly identically named modules in my intellij project window and I'd stop clicking on the wrong one when I've had too much coffee.

The LFS system is great for people who want to work on (in development) large files, but given the very lopsided ratio of model reuse vs. creation/modification in ctakes I don't think that we need to go that route.

I am only one voice of many, so this is obviously up for debate.  Thanks again,

Sean

________________________________________
From: Kean Kaufmann <ke...@recordsone.com.INVALID>
Sent: Monday, June 6, 2022 9:07 AM
To: dev@ctakes.apache.org
Subject: Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

* External Email - Caution *


Is Git LFS an option?
https://urldefense.com/v3/__https://www.atlassian.com/git/tutorials/git-lfs*installing-git-lfs__;Iw!!NZvER7FxgEiBAiR_!rhJYtElNafdN8aZaA2ELRmZRsDeX81m0IVx2yab70SFpsojM4fuIKTHlfGjo-kKfHlz_WjVFM8RgPjuPiEhaTqoWLzBOtKgs$
Needs an LFS-aware host e.g. Bitbucket; I don't know what the Apache
hosting setup is like.


On Fri, Jun 3, 2022 at 9:31 AM Finan, Sean
<Se...@childrens.harvard.edu.invalid> wrote:

> Hi Tim,
>
> >we ran into issues in previous attempts at migration with the large file
> sizes in our repo
>
> Indeed we did, and over the years I have had thoughts on that.
>
> Those large files are large ml models, which are (mostly) static,
> replaceable/interchangeable, not always necessary, and in separate resource
> (-res) modules separated from code modules.
>
> When I was a ctakes newby really disliked the separation of code from
> resources by entirely separate -res modules.  Since then, through working
> on projects that use ctakes code but not (huge) resources as dependencies,
> I have realized the wisdom of the modular separation.  In fact, I put a
> -huge- model in its own -res module so that I could <exclude> it from a
> ctakes-dependent project, saving compile (download) time and disk space.
> Like you, I don't like to "download the internet" with maven   ;^)
>
> Right now we have the ner dictionaries in sourceforge, not the apache
> repos.  While this is done for legal reasons it has worked pretty well.
>
> I think that we could maintain an apache SVN repo of -res modules
> containing only huge model files.   I am guessing that we would have to
> make it a "side/sub project" to maintain a separate repo (jenkins build,
> etc.).
>
> Anyway, it would give us the freedom to use a github repo for code (and
> non-model resources) without users needing to go through the github
> large-file workflow, which I see as a barrier to entry.
>
> Thoughts?
>
> ________________________________________
> From: Miller, Timothy <Ti...@childrens.harvard.edu.INVALID>
> Sent: Thursday, June 2, 2022 6:21 PM
> To: dev@ctakes.apache.org
> Subject: Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]
> [SUSPICIOUS] [SUSPICIOUS]
>
> * External Email - Caution *
>
>
> My recollection was that we ran into issues in previous attempts at
> migration with the large file sizes in our repo.
> Tim
>
>
> On Thu, 2022-06-02 at 20:55 +0000, Finan, Sean wrote:
>
> * External Email - Caution *
>
>
>
> Thank you Gandhi and Richard.
>
>
> Unless somebody else beats me to it I will perform some research and see
> what approaches can be used and which might be best.  In the end the cTAKES
> Project Management Committee will need to vote for any action as sweeping
> as moving to github.
>
>
> Sean
>
> ________________________________________
>
> From: gandhi rajan <
>
> <ma...@gmail.com>
>
> gandhirajan.n@gmail.com
>
> >
>
> Sent: Thursday, June 2, 2022 9:02 AM
>
> To:
>
> <ma...@ctakes.apache.org>
>
> dev@ctakes.apache.org
>
>
> Subject: Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]
>
>
> * External Email - Caution *
>
>
>
> Hi Sean,
>
>
> If we are sure that the SVN has all the latest changes and active
>
> development is primarily on SVN, then why don't we request a fresh git
>
> repository and push all the changes over there.
>
>
> More info on
>
> <
> https://urldefense.com/v3/__https://infra.apache.org/svn-to-git-migration.html__;!!NZvER7FxgEiBAiR_!rXFMCtlZM4NpDPkgzeq-X2pj1rNwzQNTpZkMZXDoYiZKdJp0n4tDY6q9IcsGRPGrA6KhvmouV_1y_txDVok-tGy3dVLaqefQlQ$
> >
>
>
> https://urldefense.com/v3/__https://infra.apache.org/svn-to-git-migration.html__;!!NZvER7FxgEiBAiR_!rXFMCtlZM4NpDPkgzeq-X2pj1rNwzQNTpZkMZXDoYiZKdJp0n4tDY6q9IcsGRPGrA6KhvmouV_1y_txDVok-tGy3dVLaqefQlQ$
>
>
>
> On Thu, Jun 2, 2022 at 5:52 PM Finan, Sean
>
> <
>
> <ma...@childrens.harvard.edu.invalid>
>
> Sean.Finan@childrens.harvard.edu.invalid
>
> > wrote:
>
>
> Hi Richard, you bring up a valid concern.
>
>
> cTAKES Developers:
>
>
> The Apache Foundation has had an initiative to "move" all projects to
>
> GitHub for some time now.
>
>
> I don't know much about how this is done.  If anybody out there has
>
> knowledge or experience that they can pass on, please share.
>
>
> Thanks,
>
> Sean
>
> ________________________________________
>
> From: Richard Eckart de Castilho <
>
> <ma...@apache.org>
>
> rec@apache.org
>
> >
>
> Sent: Thursday, June 2, 2022 3:39 AM
>
> To:
>
> <ma...@ctakes.apache.org>
>
> dev@ctakes.apache.org
>
>
> Subject: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]
>
>
> * External Email - Caution *
>
>
>
> Hi,
>
>
> it appears that the GitHub mirror of Apache cTAKES may be stuck.
>
>
> When I check the svn log of
>
> <
> https://urldefense.com/v3/__https://svn.apache.org/repos/asf/ctakes/trunk/__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9XZi-zSEw$
> >
>
>
> https://urldefense.com/v3/__https://svn.apache.org/repos/asf/ctakes/trunk/__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9XZi-zSEw$
>
>
> , I can
>
> see activity as recent as May 2022.
>
>
> However, on GitHub, I can only see stale branches:
>
>
>
> <
> https://urldefense.com/v3/__https://github.com/apache/ctakes/branches__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9Uu2s-59w$
> >
>
>
> https://urldefense.com/v3/__https://github.com/apache/ctakes/branches__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9Uu2s-59w$
>
>
>
> Wouldn't it be good if the GitHub mirror would be kept up-to-date?
>
>
> Best,
>
>
> -- Richard
>
>
>
>
> --
>
> Regards,
>
> Gandhi
>
>
> "The best way to find urself is to lose urself in the service of others
> !!!"
>

Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

Posted by Kean Kaufmann <ke...@recordsone.com.INVALID>.
Is Git LFS an option?
https://www.atlassian.com/git/tutorials/git-lfs#installing-git-lfs
Needs an LFS-aware host e.g. Bitbucket; I don't know what the Apache
hosting setup is like.


On Fri, Jun 3, 2022 at 9:31 AM Finan, Sean
<Se...@childrens.harvard.edu.invalid> wrote:

> Hi Tim,
>
> >we ran into issues in previous attempts at migration with the large file
> sizes in our repo
>
> Indeed we did, and over the years I have had thoughts on that.
>
> Those large files are large ml models, which are (mostly) static,
> replaceable/interchangeable, not always necessary, and in separate resource
> (-res) modules separated from code modules.
>
> When I was a ctakes newby really disliked the separation of code from
> resources by entirely separate -res modules.  Since then, through working
> on projects that use ctakes code but not (huge) resources as dependencies,
> I have realized the wisdom of the modular separation.  In fact, I put a
> -huge- model in its own -res module so that I could <exclude> it from a
> ctakes-dependent project, saving compile (download) time and disk space.
> Like you, I don't like to "download the internet" with maven   ;^)
>
> Right now we have the ner dictionaries in sourceforge, not the apache
> repos.  While this is done for legal reasons it has worked pretty well.
>
> I think that we could maintain an apache SVN repo of -res modules
> containing only huge model files.   I am guessing that we would have to
> make it a "side/sub project" to maintain a separate repo (jenkins build,
> etc.).
>
> Anyway, it would give us the freedom to use a github repo for code (and
> non-model resources) without users needing to go through the github
> large-file workflow, which I see as a barrier to entry.
>
> Thoughts?
>
> ________________________________________
> From: Miller, Timothy <Ti...@childrens.harvard.edu.INVALID>
> Sent: Thursday, June 2, 2022 6:21 PM
> To: dev@ctakes.apache.org
> Subject: Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]
> [SUSPICIOUS] [SUSPICIOUS]
>
> * External Email - Caution *
>
>
> My recollection was that we ran into issues in previous attempts at
> migration with the large file sizes in our repo.
> Tim
>
>
> On Thu, 2022-06-02 at 20:55 +0000, Finan, Sean wrote:
>
> * External Email - Caution *
>
>
>
> Thank you Gandhi and Richard.
>
>
> Unless somebody else beats me to it I will perform some research and see
> what approaches can be used and which might be best.  In the end the cTAKES
> Project Management Committee will need to vote for any action as sweeping
> as moving to github.
>
>
> Sean
>
> ________________________________________
>
> From: gandhi rajan <
>
> <ma...@gmail.com>
>
> gandhirajan.n@gmail.com
>
> >
>
> Sent: Thursday, June 2, 2022 9:02 AM
>
> To:
>
> <ma...@ctakes.apache.org>
>
> dev@ctakes.apache.org
>
>
> Subject: Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]
>
>
> * External Email - Caution *
>
>
>
> Hi Sean,
>
>
> If we are sure that the SVN has all the latest changes and active
>
> development is primarily on SVN, then why don't we request a fresh git
>
> repository and push all the changes over there.
>
>
> More info on
>
> <
> https://urldefense.com/v3/__https://infra.apache.org/svn-to-git-migration.html__;!!NZvER7FxgEiBAiR_!rXFMCtlZM4NpDPkgzeq-X2pj1rNwzQNTpZkMZXDoYiZKdJp0n4tDY6q9IcsGRPGrA6KhvmouV_1y_txDVok-tGy3dVLaqefQlQ$
> >
>
>
> https://urldefense.com/v3/__https://infra.apache.org/svn-to-git-migration.html__;!!NZvER7FxgEiBAiR_!rXFMCtlZM4NpDPkgzeq-X2pj1rNwzQNTpZkMZXDoYiZKdJp0n4tDY6q9IcsGRPGrA6KhvmouV_1y_txDVok-tGy3dVLaqefQlQ$
>
>
>
> On Thu, Jun 2, 2022 at 5:52 PM Finan, Sean
>
> <
>
> <ma...@childrens.harvard.edu.invalid>
>
> Sean.Finan@childrens.harvard.edu.invalid
>
> > wrote:
>
>
> Hi Richard, you bring up a valid concern.
>
>
> cTAKES Developers:
>
>
> The Apache Foundation has had an initiative to "move" all projects to
>
> GitHub for some time now.
>
>
> I don't know much about how this is done.  If anybody out there has
>
> knowledge or experience that they can pass on, please share.
>
>
> Thanks,
>
> Sean
>
> ________________________________________
>
> From: Richard Eckart de Castilho <
>
> <ma...@apache.org>
>
> rec@apache.org
>
> >
>
> Sent: Thursday, June 2, 2022 3:39 AM
>
> To:
>
> <ma...@ctakes.apache.org>
>
> dev@ctakes.apache.org
>
>
> Subject: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]
>
>
> * External Email - Caution *
>
>
>
> Hi,
>
>
> it appears that the GitHub mirror of Apache cTAKES may be stuck.
>
>
> When I check the svn log of
>
> <
> https://urldefense.com/v3/__https://svn.apache.org/repos/asf/ctakes/trunk/__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9XZi-zSEw$
> >
>
>
> https://urldefense.com/v3/__https://svn.apache.org/repos/asf/ctakes/trunk/__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9XZi-zSEw$
>
>
> , I can
>
> see activity as recent as May 2022.
>
>
> However, on GitHub, I can only see stale branches:
>
>
>
> <
> https://urldefense.com/v3/__https://github.com/apache/ctakes/branches__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9Uu2s-59w$
> >
>
>
> https://urldefense.com/v3/__https://github.com/apache/ctakes/branches__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9Uu2s-59w$
>
>
>
> Wouldn't it be good if the GitHub mirror would be kept up-to-date?
>
>
> Best,
>
>
> -- Richard
>
>
>
>
> --
>
> Regards,
>
> Gandhi
>
>
> "The best way to find urself is to lose urself in the service of others
> !!!"
>