You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Fossies Administrator <Je...@fossies.org> on 2019/12/10 15:51:50 UTC

Codespell report for Tika 1.23

Hi,

the FOSS server fossies.org offers a new feature "Source code misspelling 
reports":

  https://fossies.org/features.html#codespell

Although such reports are normally only generated on request, as Fossies 
administrator I have just created (for testing purposes) an analysis for 
the current release Tika 1.23:

  https://fossies.org/linux/misc/tika/codespell.html

That version-independent URL should redirect always to the last report
(if available), so currently to

  https://fossies.org/linux/misc/tika-1.23-src.zip/codespell.html

Although some obviously wrong matches ("false positives") are already 
filtered (ignored) please inform me if you find more of them so that I can 
force a new improved check if applicable.

Just for information there are also two supplemental pages

  https://fossies.org/linux/misc/tika/codespell_conf.html

showing some used "codespell" configurations and

  https://fossies.org/linux/misc/tika/codespell_fps.html

showing all resulting obvious "false positives".

Regards

Jens

-- 
FOSSIES - The Fresh Open Source Software archive
mainly for Internet, Engineering and Science
https://fossies.org/

Re: Codespell report for Tika 1.23

Posted by Tim Allison <ta...@apache.org>.
Ooooooo....dig it...fixing now.  Thank you, Jens!

On Tue, Dec 10, 2019 at 10:52 AM Fossies Administrator <
Jens.Schleusener@fossies.org> wrote:

> Hi,
>
> the FOSS server fossies.org offers a new feature "Source code misspelling
> reports":
>
>   https://fossies.org/features.html#codespell
>
> Although such reports are normally only generated on request, as Fossies
> administrator I have just created (for testing purposes) an analysis for
> the current release Tika 1.23:
>
>   https://fossies.org/linux/misc/tika/codespell.html
>
> That version-independent URL should redirect always to the last report
> (if available), so currently to
>
>   https://fossies.org/linux/misc/tika-1.23-src.zip/codespell.html
>
> Although some obviously wrong matches ("false positives") are already
> filtered (ignored) please inform me if you find more of them so that I can
> force a new improved check if applicable.
>
> Just for information there are also two supplemental pages
>
>   https://fossies.org/linux/misc/tika/codespell_conf.html
>
> showing some used "codespell" configurations and
>
>   https://fossies.org/linux/misc/tika/codespell_fps.html
>
> showing all resulting obvious "false positives".
>
> Regards
>
> Jens
>
> --
> FOSSIES - The Fresh Open Source Software archive
> mainly for Internet, Engineering and Science
> https://fossies.org/
>

Re: Codespell report for Tika 1.23

Posted by Fossies Administrator <Je...@fossies.org>.
On Wed, 25 Dec 2019, Tilman Hausherr wrote:

> Hello Jens,
> Thank you again, I have corrected all I wanted to, and created one issue for 
> a false positive
> https://github.com/codespell-project/codespell/issues/1399
> Tilman

Yes, that is a false positive but I assume that the issue isn't easily to 
solve since "codespell" claims to be "designed primarily for checking 
misspelled words in source code" but the context recognition seems 
currently to be improvable.

So it's more my error while manually pre-checking for false positives.
I let ignore now also "endianess" and "instanciate" and the current result 
(with the very good rating grade: "A") can be found here:

  https://fossies.org/linux/test/pdfbox-trunk.zip/codespell.html
  https://fossies.org/linux/test/pdfbox-trunk-a6bc826.191225.zip/codespell.html

Regards

Jens

> Am 15.12.2019 um 16:33 schrieb Fossies Administrator:
>>  Hi Tilman,
>>
>>>  Thank you! I've now corrected all typos except those related to variable
>>>  / method names (want to keep API stability), "Cloneable
>>>  <https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell.html#Cloneable>"
>>>  (that is in java itself LOL) and a few that are in resource files (these
>>>  are text extractions, i.e. the typos are in the original PDF, e.g.
>>>  PDFBOX-3044-010197-p5-ligatures.pdf).
>>
>>  Oops, that file I have overseen and "Cloneable" is now also ignored.
>>
>>>  Yes, I would like to have a report for the trunk too, although I don't
>>>  expect much new typos.
>>
>>  A new "false positive" word "hIST" is now ignored but for better
>>  comparability I have leaved all other unchanged.
>>
>>  Here the main URLs for trunk checked out today Sunday at 14:59 CET.
>>
>>   https://fossies.org/linux/test/pdfbox-trunk.zip/codespell.html
>>   https://fossies.org/linux/test/pdfbox-trunk.191215_1459.zip/codespell.html 
>> 
>>
>>  Looks much better!
>>
>>  Regards
>>
>>  Jens
>>
>>>  Am 11.12.2019 um 21:50 schrieb Fossies Administrator:
>>>>   Hi Tilman,
>>>>
>>>>>   Am 10.12.2019 um 16:51 schrieb Fossies Administrator:
>>>>>>    Although such reports are normally only generated on request
>>>>> 
>>>>>
>>>>>   Hello, can we also get this for Apache PDFBox? I've corrected typos
>>>>>  when
>>>>>   I hit them, but I can't look everywhere.
>>>>>
>>>>>   https://github.com/apache/pdfbox/
>>>>>
>>>>>   or
>>>>>
>>>>>   https://svn.apache.org/repos/asf/pdfbox/
>>>>>
>>>>>   The PDFBox is used by the Tika project, and has people common to both
>>>>>   projects.
>>>>
>>>>   Although Fossies has now also the possibilty to create such reports in
>>>>  a
>>>>   special test folder that isn't integrated in the Fossies standard
>>>>  services
>>>>   and should hopefully also not accessible by search engines, that
>>>>  package
>>>>   is now included in the main Fossies folder "/linux/misc":
>>>>
>>>>    https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/
>>>>
>>>>   The according codespell URLs are
>>>>
>>>>    https://fossies.org/linux/misc/pdfbox/codespell.html
>>>>
>>>>   currently redirecting to
>>>>
>>>>   https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell.html
>>>>
>>>>   and
>>>>
>>>>    https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell_conf.html 
>>>>
>>>>    https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell_fps.html 
>>>> 
>>>>
>>>>   If it would be meaningful to do a codespell check for e.g. for the
>>>>  "trunk"
>>>>   version so let it know me and I can do that in the mentioned
>>>>  "/linux/test"
>>>>   folder.
>>>>
>>>>   Regards
>>>>
>>>>   Jens

Re: Codespell report for Tika 1.23

Posted by Tilman Hausherr <TH...@t-online.de>.
Hello Jens,
Thank you again, I have corrected all I wanted to, and created one issue 
for a false positive
https://github.com/codespell-project/codespell/issues/1399
Tilman

Am 15.12.2019 um 16:33 schrieb Fossies Administrator:
> Hi Tilman,
>
>> Thank you! I've now corrected all typos except those related to 
>> variable / method names (want to keep API stability), "Cloneable 
>> <https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell.html#Cloneable>" 
>> (that is in java itself LOL) and a few that are in resource files 
>> (these are text extractions, i.e. the typos are in the original PDF, 
>> e.g. PDFBOX-3044-010197-p5-ligatures.pdf).
>
> Oops, that file I have overseen and "Cloneable" is now also ignored.
>
>> Yes, I would like to have a report for the trunk too, although I 
>> don't expect much new typos.
>
> A new "false positive" word "hIST" is now ignored but for better 
> comparability I have leaved all other unchanged.
>
> Here the main URLs for trunk checked out today Sunday at 14:59 CET.
>
>  https://fossies.org/linux/test/pdfbox-trunk.zip/codespell.html
>  https://fossies.org/linux/test/pdfbox-trunk.191215_1459.zip/codespell.html 
>
>
> Looks much better!
>
> Regards
>
> Jens
>
>> Am 11.12.2019 um 21:50 schrieb Fossies Administrator:
>>>  Hi Tilman,
>>>
>>>>  Am 10.12.2019 um 16:51 schrieb Fossies Administrator:
>>>>>   Although such reports are normally only generated on request
>>>>
>>>>
>>>>  Hello, can we also get this for Apache PDFBox? I've corrected 
>>>> typos when
>>>>  I hit them, but I can't look everywhere.
>>>>
>>>>  https://github.com/apache/pdfbox/
>>>>
>>>>  or
>>>>
>>>>  https://svn.apache.org/repos/asf/pdfbox/
>>>>
>>>>  The PDFBox is used by the Tika project, and has people common to both
>>>>  projects.
>>>
>>>  Although Fossies has now also the possibilty to create such reports 
>>> in a
>>>  special test folder that isn't integrated in the Fossies standard 
>>> services
>>>  and should hopefully also not accessible by search engines, that 
>>> package
>>>  is now included in the main Fossies folder "/linux/misc":
>>>
>>>   https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/
>>>
>>>  The according codespell URLs are
>>>
>>>   https://fossies.org/linux/misc/pdfbox/codespell.html
>>>
>>>  currently redirecting to
>>>
>>>  https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell.html
>>>
>>>  and
>>>
>>>   https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell_conf.html 
>>>
>>>   https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell_fps.html 
>>>
>>>
>>>  If it would be meaningful to do a codespell check for e.g. for the 
>>> "trunk"
>>>  version so let it know me and I can do that in the mentioned 
>>> "/linux/test"
>>>  folder.
>>>
>>>  Regards
>>>
>>>  Jens
>


Re: Codespell report for Tika 1.23

Posted by Fossies Administrator <Je...@fossies.org>.
Hi Tilman,

> Thank you! I've now corrected all typos except those related to variable / 
> method names (want to keep API stability), "Cloneable 
> <https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell.html#Cloneable>" 
> (that is in java itself LOL) and a few that are in resource files (these are 
> text extractions, i.e. the typos are in the original PDF, e.g. 
> PDFBOX-3044-010197-p5-ligatures.pdf).

Oops, that file I have overseen and "Cloneable" is now also ignored.

> Yes, I would like to have a report for the trunk too, although I don't expect 
> much new typos.

A new "false positive" word "hIST" is now ignored but for better 
comparability I have leaved all other unchanged.

Here the main URLs for trunk checked out today Sunday at 14:59 CET.

  https://fossies.org/linux/test/pdfbox-trunk.zip/codespell.html
  https://fossies.org/linux/test/pdfbox-trunk.191215_1459.zip/codespell.html

Looks much better!

Regards

Jens

> Am 11.12.2019 um 21:50 schrieb Fossies Administrator:
>>  Hi Tilman,
>>
>>>  Am 10.12.2019 um 16:51 schrieb Fossies Administrator:
>>>>   Although such reports are normally only generated on request
>>> 
>>>
>>>  Hello, can we also get this for Apache PDFBox? I've corrected typos when
>>>  I hit them, but I can't look everywhere.
>>>
>>>  https://github.com/apache/pdfbox/
>>>
>>>  or
>>>
>>>  https://svn.apache.org/repos/asf/pdfbox/
>>>
>>>  The PDFBox is used by the Tika project, and has people common to both
>>>  projects.
>>
>>  Although Fossies has now also the possibilty to create such reports in a
>>  special test folder that isn't integrated in the Fossies standard services
>>  and should hopefully also not accessible by search engines, that package
>>  is now included in the main Fossies folder "/linux/misc":
>>
>>   https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/
>>
>>  The according codespell URLs are
>>
>>   https://fossies.org/linux/misc/pdfbox/codespell.html
>>
>>  currently redirecting to
>>
>>  https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell.html
>>
>>  and
>>
>>   https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell_conf.html
>>   https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell_fps.html
>>
>>  If it would be meaningful to do a codespell check for e.g. for the "trunk"
>>  version so let it know me and I can do that in the mentioned "/linux/test"
>>  folder.
>>
>>  Regards
>>
>>  Jens

-- 
FOSSIES - The Fresh Open Source Software archive
mainly for Internet, Engineering and Science
https://fossies.org/

Re: Codespell report for Tika 1.23

Posted by Tilman Hausherr <TH...@t-online.de>.
Hello Jens,

Thank you! I've now corrected all typos except those related to variable 
/ method names (want to keep API stability), "Cloneable 
<https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell.html#Cloneable>" 
(that is in java itself LOL) and a few that are in resource files (these 
are text extractions, i.e. the typos are in the original PDF, e.g. 
PDFBOX-3044-010197-p5-ligatures.pdf).

Yes, I would like to have a report for the trunk too, although I don't 
expect much new typos.

Thanks
Tilman

Am 11.12.2019 um 21:50 schrieb Fossies Administrator:
> Hi Tilman,
>
>> Am 10.12.2019 um 16:51 schrieb Fossies Administrator:
>>>  Although such reports are normally only generated on request
>>
>>
>> Hello, can we also get this for Apache PDFBox? I've corrected typos 
>> when I hit them, but I can't look everywhere.
>>
>> https://github.com/apache/pdfbox/
>>
>> or
>>
>> https://svn.apache.org/repos/asf/pdfbox/
>>
>> The PDFBox is used by the Tika project, and has people common to both 
>> projects.
>
> Although Fossies has now also the possibilty to create such reports in 
> a special test folder that isn't integrated in the Fossies standard 
> services and should hopefully also not accessible by search engines, 
> that package is now included in the main Fossies folder "/linux/misc":
>
>  https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/
>
> The according codespell URLs are
>
>  https://fossies.org/linux/misc/pdfbox/codespell.html
>
> currently redirecting to
>
> https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell.html
>
> and
>
>  https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell_conf.html
>  https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell_fps.html
>
> If it would be meaningful to do a codespell check for e.g. for the 
> "trunk" version so let it know me and I can do that in the mentioned 
> "/linux/test" folder.
>
> Regards
>
> Jens
>


Re: Codespell report for Tika 1.23

Posted by Fossies Administrator <Je...@fossies.org>.
Hi Tilman,

> Am 10.12.2019 um 16:51 schrieb Fossies Administrator:
>>  Although such reports are normally only generated on request
>
>
> Hello, can we also get this for Apache PDFBox? I've corrected typos when I 
> hit them, but I can't look everywhere.
>
> https://github.com/apache/pdfbox/
>
> or
>
> https://svn.apache.org/repos/asf/pdfbox/
>
> The PDFBox is used by the Tika project, and has people common to both 
> projects.

Although Fossies has now also the possibilty to create such reports in a 
special test folder that isn't integrated in the Fossies standard services 
and should hopefully also not accessible by search engines, that package 
is now included in the main Fossies folder "/linux/misc":

  https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/

The according codespell URLs are

  https://fossies.org/linux/misc/pdfbox/codespell.html

currently redirecting to

   https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell.html

and

  https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell_conf.html
  https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell_fps.html

If it would be meaningful to do a codespell check for e.g. for the "trunk" 
version so let it know me and I can do that in the mentioned "/linux/test" 
folder.

Regards

Jens

-- 
FOSSIES - The Fresh Open Source Software archive
mainly for Internet, Engineering and Science
https://fossies.org/

Re: Codespell report for Tika 1.23

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 10.12.2019 um 16:51 schrieb Fossies Administrator:
> Although such reports are normally only generated on request


Hello, can we also get this for Apache PDFBox? I've corrected typos when 
I hit them, but I can't look everywhere.

https://github.com/apache/pdfbox/

or

https://svn.apache.org/repos/asf/pdfbox/

The PDFBox is used by the Tika project, and has people common to both 
projects.

Tilman