You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Fossies Administrator <Je...@fossies.org> on 2019/12/10 15:51:50 UTC
Codespell report for Tika 1.23
Hi,
the FOSS server fossies.org offers a new feature "Source code misspelling
reports":
https://fossies.org/features.html#codespell
Although such reports are normally only generated on request, as Fossies
administrator I have just created (for testing purposes) an analysis for
the current release Tika 1.23:
https://fossies.org/linux/misc/tika/codespell.html
That version-independent URL should redirect always to the last report
(if available), so currently to
https://fossies.org/linux/misc/tika-1.23-src.zip/codespell.html
Although some obviously wrong matches ("false positives") are already
filtered (ignored) please inform me if you find more of them so that I can
force a new improved check if applicable.
Just for information there are also two supplemental pages
https://fossies.org/linux/misc/tika/codespell_conf.html
showing some used "codespell" configurations and
https://fossies.org/linux/misc/tika/codespell_fps.html
showing all resulting obvious "false positives".
Regards
Jens
--
FOSSIES - The Fresh Open Source Software archive
mainly for Internet, Engineering and Science
https://fossies.org/
Re: Codespell report for Tika 1.23
Posted by Tim Allison <ta...@apache.org>.
Ooooooo....dig it...fixing now. Thank you, Jens!
On Tue, Dec 10, 2019 at 10:52 AM Fossies Administrator <
Jens.Schleusener@fossies.org> wrote:
> Hi,
>
> the FOSS server fossies.org offers a new feature "Source code misspelling
> reports":
>
> https://fossies.org/features.html#codespell
>
> Although such reports are normally only generated on request, as Fossies
> administrator I have just created (for testing purposes) an analysis for
> the current release Tika 1.23:
>
> https://fossies.org/linux/misc/tika/codespell.html
>
> That version-independent URL should redirect always to the last report
> (if available), so currently to
>
> https://fossies.org/linux/misc/tika-1.23-src.zip/codespell.html
>
> Although some obviously wrong matches ("false positives") are already
> filtered (ignored) please inform me if you find more of them so that I can
> force a new improved check if applicable.
>
> Just for information there are also two supplemental pages
>
> https://fossies.org/linux/misc/tika/codespell_conf.html
>
> showing some used "codespell" configurations and
>
> https://fossies.org/linux/misc/tika/codespell_fps.html
>
> showing all resulting obvious "false positives".
>
> Regards
>
> Jens
>
> --
> FOSSIES - The Fresh Open Source Software archive
> mainly for Internet, Engineering and Science
> https://fossies.org/
>
Re: Codespell report for Tika 1.23
Posted by Fossies Administrator <Je...@fossies.org>.
On Wed, 25 Dec 2019, Tilman Hausherr wrote:
> Hello Jens,
> Thank you again, I have corrected all I wanted to, and created one issue for
> a false positive
> https://github.com/codespell-project/codespell/issues/1399
> Tilman
Yes, that is a false positive but I assume that the issue isn't easily to
solve since "codespell" claims to be "designed primarily for checking
misspelled words in source code" but the context recognition seems
currently to be improvable.
So it's more my error while manually pre-checking for false positives.
I let ignore now also "endianess" and "instanciate" and the current result
(with the very good rating grade: "A") can be found here:
https://fossies.org/linux/test/pdfbox-trunk.zip/codespell.html
https://fossies.org/linux/test/pdfbox-trunk-a6bc826.191225.zip/codespell.html
Regards
Jens
> Am 15.12.2019 um 16:33 schrieb Fossies Administrator:
>> Hi Tilman,
>>
>>> Thank you! I've now corrected all typos except those related to variable
>>> / method names (want to keep API stability), "Cloneable
>>> <https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell.html#Cloneable>"
>>> (that is in java itself LOL) and a few that are in resource files (these
>>> are text extractions, i.e. the typos are in the original PDF, e.g.
>>> PDFBOX-3044-010197-p5-ligatures.pdf).
>>
>> Oops, that file I have overseen and "Cloneable" is now also ignored.
>>
>>> Yes, I would like to have a report for the trunk too, although I don't
>>> expect much new typos.
>>
>> A new "false positive" word "hIST" is now ignored but for better
>> comparability I have leaved all other unchanged.
>>
>> Here the main URLs for trunk checked out today Sunday at 14:59 CET.
>>
>> https://fossies.org/linux/test/pdfbox-trunk.zip/codespell.html
>> https://fossies.org/linux/test/pdfbox-trunk.191215_1459.zip/codespell.html
>>
>>
>> Looks much better!
>>
>> Regards
>>
>> Jens
>>
>>> Am 11.12.2019 um 21:50 schrieb Fossies Administrator:
>>>> Hi Tilman,
>>>>
>>>>> Am 10.12.2019 um 16:51 schrieb Fossies Administrator:
>>>>>> Although such reports are normally only generated on request
>>>>>
>>>>>
>>>>> Hello, can we also get this for Apache PDFBox? I've corrected typos
>>>>> when
>>>>> I hit them, but I can't look everywhere.
>>>>>
>>>>> https://github.com/apache/pdfbox/
>>>>>
>>>>> or
>>>>>
>>>>> https://svn.apache.org/repos/asf/pdfbox/
>>>>>
>>>>> The PDFBox is used by the Tika project, and has people common to both
>>>>> projects.
>>>>
>>>> Although Fossies has now also the possibilty to create such reports in
>>>> a
>>>> special test folder that isn't integrated in the Fossies standard
>>>> services
>>>> and should hopefully also not accessible by search engines, that
>>>> package
>>>> is now included in the main Fossies folder "/linux/misc":
>>>>
>>>> https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/
>>>>
>>>> The according codespell URLs are
>>>>
>>>> https://fossies.org/linux/misc/pdfbox/codespell.html
>>>>
>>>> currently redirecting to
>>>>
>>>> https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell.html
>>>>
>>>> and
>>>>
>>>> https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell_conf.html
>>>>
>>>> https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell_fps.html
>>>>
>>>>
>>>> If it would be meaningful to do a codespell check for e.g. for the
>>>> "trunk"
>>>> version so let it know me and I can do that in the mentioned
>>>> "/linux/test"
>>>> folder.
>>>>
>>>> Regards
>>>>
>>>> Jens
Re: Codespell report for Tika 1.23
Posted by Tilman Hausherr <TH...@t-online.de>.
Hello Jens,
Thank you again, I have corrected all I wanted to, and created one issue
for a false positive
https://github.com/codespell-project/codespell/issues/1399
Tilman
Am 15.12.2019 um 16:33 schrieb Fossies Administrator:
> Hi Tilman,
>
>> Thank you! I've now corrected all typos except those related to
>> variable / method names (want to keep API stability), "Cloneable
>> <https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell.html#Cloneable>"
>> (that is in java itself LOL) and a few that are in resource files
>> (these are text extractions, i.e. the typos are in the original PDF,
>> e.g. PDFBOX-3044-010197-p5-ligatures.pdf).
>
> Oops, that file I have overseen and "Cloneable" is now also ignored.
>
>> Yes, I would like to have a report for the trunk too, although I
>> don't expect much new typos.
>
> A new "false positive" word "hIST" is now ignored but for better
> comparability I have leaved all other unchanged.
>
> Here the main URLs for trunk checked out today Sunday at 14:59 CET.
>
> https://fossies.org/linux/test/pdfbox-trunk.zip/codespell.html
> https://fossies.org/linux/test/pdfbox-trunk.191215_1459.zip/codespell.html
>
>
> Looks much better!
>
> Regards
>
> Jens
>
>> Am 11.12.2019 um 21:50 schrieb Fossies Administrator:
>>> Hi Tilman,
>>>
>>>> Am 10.12.2019 um 16:51 schrieb Fossies Administrator:
>>>>> Although such reports are normally only generated on request
>>>>
>>>>
>>>> Hello, can we also get this for Apache PDFBox? I've corrected
>>>> typos when
>>>> I hit them, but I can't look everywhere.
>>>>
>>>> https://github.com/apache/pdfbox/
>>>>
>>>> or
>>>>
>>>> https://svn.apache.org/repos/asf/pdfbox/
>>>>
>>>> The PDFBox is used by the Tika project, and has people common to both
>>>> projects.
>>>
>>> Although Fossies has now also the possibilty to create such reports
>>> in a
>>> special test folder that isn't integrated in the Fossies standard
>>> services
>>> and should hopefully also not accessible by search engines, that
>>> package
>>> is now included in the main Fossies folder "/linux/misc":
>>>
>>> https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/
>>>
>>> The according codespell URLs are
>>>
>>> https://fossies.org/linux/misc/pdfbox/codespell.html
>>>
>>> currently redirecting to
>>>
>>> https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell.html
>>>
>>> and
>>>
>>> https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell_conf.html
>>>
>>> https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell_fps.html
>>>
>>>
>>> If it would be meaningful to do a codespell check for e.g. for the
>>> "trunk"
>>> version so let it know me and I can do that in the mentioned
>>> "/linux/test"
>>> folder.
>>>
>>> Regards
>>>
>>> Jens
>
Re: Codespell report for Tika 1.23
Posted by Fossies Administrator <Je...@fossies.org>.
Hi Tilman,
> Thank you! I've now corrected all typos except those related to variable /
> method names (want to keep API stability), "Cloneable
> <https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell.html#Cloneable>"
> (that is in java itself LOL) and a few that are in resource files (these are
> text extractions, i.e. the typos are in the original PDF, e.g.
> PDFBOX-3044-010197-p5-ligatures.pdf).
Oops, that file I have overseen and "Cloneable" is now also ignored.
> Yes, I would like to have a report for the trunk too, although I don't expect
> much new typos.
A new "false positive" word "hIST" is now ignored but for better
comparability I have leaved all other unchanged.
Here the main URLs for trunk checked out today Sunday at 14:59 CET.
https://fossies.org/linux/test/pdfbox-trunk.zip/codespell.html
https://fossies.org/linux/test/pdfbox-trunk.191215_1459.zip/codespell.html
Looks much better!
Regards
Jens
> Am 11.12.2019 um 21:50 schrieb Fossies Administrator:
>> Hi Tilman,
>>
>>> Am 10.12.2019 um 16:51 schrieb Fossies Administrator:
>>>> Although such reports are normally only generated on request
>>>
>>>
>>> Hello, can we also get this for Apache PDFBox? I've corrected typos when
>>> I hit them, but I can't look everywhere.
>>>
>>> https://github.com/apache/pdfbox/
>>>
>>> or
>>>
>>> https://svn.apache.org/repos/asf/pdfbox/
>>>
>>> The PDFBox is used by the Tika project, and has people common to both
>>> projects.
>>
>> Although Fossies has now also the possibilty to create such reports in a
>> special test folder that isn't integrated in the Fossies standard services
>> and should hopefully also not accessible by search engines, that package
>> is now included in the main Fossies folder "/linux/misc":
>>
>> https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/
>>
>> The according codespell URLs are
>>
>> https://fossies.org/linux/misc/pdfbox/codespell.html
>>
>> currently redirecting to
>>
>> https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell.html
>>
>> and
>>
>> https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell_conf.html
>> https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell_fps.html
>>
>> If it would be meaningful to do a codespell check for e.g. for the "trunk"
>> version so let it know me and I can do that in the mentioned "/linux/test"
>> folder.
>>
>> Regards
>>
>> Jens
--
FOSSIES - The Fresh Open Source Software archive
mainly for Internet, Engineering and Science
https://fossies.org/
Re: Codespell report for Tika 1.23
Posted by Tilman Hausherr <TH...@t-online.de>.
Hello Jens,
Thank you! I've now corrected all typos except those related to variable
/ method names (want to keep API stability), "Cloneable
<https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell.html#Cloneable>"
(that is in java itself LOL) and a few that are in resource files (these
are text extractions, i.e. the typos are in the original PDF, e.g.
PDFBOX-3044-010197-p5-ligatures.pdf).
Yes, I would like to have a report for the trunk too, although I don't
expect much new typos.
Thanks
Tilman
Am 11.12.2019 um 21:50 schrieb Fossies Administrator:
> Hi Tilman,
>
>> Am 10.12.2019 um 16:51 schrieb Fossies Administrator:
>>> Although such reports are normally only generated on request
>>
>>
>> Hello, can we also get this for Apache PDFBox? I've corrected typos
>> when I hit them, but I can't look everywhere.
>>
>> https://github.com/apache/pdfbox/
>>
>> or
>>
>> https://svn.apache.org/repos/asf/pdfbox/
>>
>> The PDFBox is used by the Tika project, and has people common to both
>> projects.
>
> Although Fossies has now also the possibilty to create such reports in
> a special test folder that isn't integrated in the Fossies standard
> services and should hopefully also not accessible by search engines,
> that package is now included in the main Fossies folder "/linux/misc":
>
> https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/
>
> The according codespell URLs are
>
> https://fossies.org/linux/misc/pdfbox/codespell.html
>
> currently redirecting to
>
> https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell.html
>
> and
>
> https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell_conf.html
> https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell_fps.html
>
> If it would be meaningful to do a codespell check for e.g. for the
> "trunk" version so let it know me and I can do that in the mentioned
> "/linux/test" folder.
>
> Regards
>
> Jens
>
Re: Codespell report for Tika 1.23
Posted by Fossies Administrator <Je...@fossies.org>.
Hi Tilman,
> Am 10.12.2019 um 16:51 schrieb Fossies Administrator:
>> Although such reports are normally only generated on request
>
>
> Hello, can we also get this for Apache PDFBox? I've corrected typos when I
> hit them, but I can't look everywhere.
>
> https://github.com/apache/pdfbox/
>
> or
>
> https://svn.apache.org/repos/asf/pdfbox/
>
> The PDFBox is used by the Tika project, and has people common to both
> projects.
Although Fossies has now also the possibilty to create such reports in a
special test folder that isn't integrated in the Fossies standard services
and should hopefully also not accessible by search engines, that package
is now included in the main Fossies folder "/linux/misc":
https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/
The according codespell URLs are
https://fossies.org/linux/misc/pdfbox/codespell.html
currently redirecting to
https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell.html
and
https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell_conf.html
https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell_fps.html
If it would be meaningful to do a codespell check for e.g. for the "trunk"
version so let it know me and I can do that in the mentioned "/linux/test"
folder.
Regards
Jens
--
FOSSIES - The Fresh Open Source Software archive
mainly for Internet, Engineering and Science
https://fossies.org/
Re: Codespell report for Tika 1.23
Posted by Tilman Hausherr <TH...@t-online.de>.
Am 10.12.2019 um 16:51 schrieb Fossies Administrator:
> Although such reports are normally only generated on request
Hello, can we also get this for Apache PDFBox? I've corrected typos when
I hit them, but I can't look everywhere.
https://github.com/apache/pdfbox/
or
https://svn.apache.org/repos/asf/pdfbox/
The PDFBox is used by the Tika project, and has people common to both
projects.
Tilman