You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Nicholas DiPiazza <ni...@gmail.com> on 2021/01/13 16:28:02 UTC

Looking for PR code review for DWG parser changes

Looking for code review of:

https://github.com/apache/tika/pull/395

This addresses TIKA-1735 and it also adds the ability for the dwg parser to
utilize the LibreDWG library if it is configured.

The DWG reading code is much too vast and complex to hope to port to Java.
So similar to how we do tesseract, if DWGConfig.properties is present on
the classpath and contains a valid path to the dwgread executable, it will
call dwgread to extract text from DWG files.

In terms of unit tests - need some love there. Is there some way we can get
libre DWG installed on the jenkins server so that it can actually run the
tests that exercise dwgread?

-Nicholas

Re: Looking for PR code review for DWG parser changes

Posted by Nicholas DiPiazza <ni...@gmail.com>.
Definitely take your time! No pressure from my end, and I appreciate all
that you do for this project!

On Wed, Jan 13, 2021 at 2:48 PM Tim Allison <ta...@apache.org> wrote:

> Nicholas,
>
>   I'm really grateful for your PR.  Once I roll 2.0.0-ALPHA, I'll have time
> to take a look.  I'm out a bit next week...so might not be until towards
> the end of next week.
>
>   If there are other devs who want to take this, please do.
>
>   Please don't take my lack of response as a failure of gratitude. :)
>
> Cheers,
>
>               Tim
>
> On Wed, Jan 13, 2021 at 11:28 AM Nicholas DiPiazza <
> nicholas.dipiazza@gmail.com> wrote:
>
> > Looking for code review of:
> >
> > https://github.com/apache/tika/pull/395
> >
> > This addresses TIKA-1735 and it also adds the ability for the dwg parser
> to
> > utilize the LibreDWG library if it is configured.
> >
> > The DWG reading code is much too vast and complex to hope to port to
> Java.
> > So similar to how we do tesseract, if DWGConfig.properties is present on
> > the classpath and contains a valid path to the dwgread executable, it
> will
> > call dwgread to extract text from DWG files.
> >
> > In terms of unit tests - need some love there. Is there some way we can
> get
> > libre DWG installed on the jenkins server so that it can actually run the
> > tests that exercise dwgread?
> >
> > -Nicholas
> >
>

Re: Looking for PR code review for DWG parser changes

Posted by Tim Allison <ta...@apache.org>.
Nicholas,

  I'm really grateful for your PR.  Once I roll 2.0.0-ALPHA, I'll have time
to take a look.  I'm out a bit next week...so might not be until towards
the end of next week.

  If there are other devs who want to take this, please do.

  Please don't take my lack of response as a failure of gratitude. :)

Cheers,

              Tim

On Wed, Jan 13, 2021 at 11:28 AM Nicholas DiPiazza <
nicholas.dipiazza@gmail.com> wrote:

> Looking for code review of:
>
> https://github.com/apache/tika/pull/395
>
> This addresses TIKA-1735 and it also adds the ability for the dwg parser to
> utilize the LibreDWG library if it is configured.
>
> The DWG reading code is much too vast and complex to hope to port to Java.
> So similar to how we do tesseract, if DWGConfig.properties is present on
> the classpath and contains a valid path to the dwgread executable, it will
> call dwgread to extract text from DWG files.
>
> In terms of unit tests - need some love there. Is there some way we can get
> libre DWG installed on the jenkins server so that it can actually run the
> tests that exercise dwgread?
>
> -Nicholas
>