You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Allison, Timothy B." <ta...@mitre.org> on 2015/04/22 14:47:17 UTC

comparing Tika's file detect with other tools?

Would it be frowned upon to compare Tika's file detection with other tools, like "file"?  Any concerns about effectively reverse engineering (when we find that Tika is wrong) from a non-Apache project?

Any other sensitivities I should be aware of?

Best,

              Tim

RE: comparing Tika's file detect with other tools?

Posted by "Allison, Timothy B." <ta...@mitre.org>.
Ken, 
  Thank you.

Tyler,
  I don't know why I had missed that issue.  Thank you!  Do we need to worry about licensing issues if we effectively copy and paste from /usr/share/misc/magic (say, on rhel)?  I didn't see a license in the file, and I guess it is in the public domain?  

  I realize that we can't just copy and paste wholesale based on Nick's points, but for those that we can "re-implement" by our methods, can we use that file?

         Best,

                    Tim

-----Original Message-----
From: Tyler Palsulich [mailto:tpalsulich@gmail.com] 
Sent: Wednesday, April 22, 2015 11:34 AM
To: dev@tika.apache.org
Subject: Re: comparing Tika's file detect with other tools?

Hi Tim,

I do not know about if there would be licensing concerns. But, we do have
TIKA-289 to track merging magic bytes from `file` into Tika.

Tyler

On Wed, Apr 22, 2015 at 10:40 AM, Ken Krugler <kk...@transpac.com>
wrote:

> Hi Tim,
>
> I don't believe there's any issue with comparing results.
>
> If you were looking at the source for "file", then it gets more gray, but
> I think even that would be OK as long as you weren't copying code or
> directly re-implementing algorithms.
>
> -- Ken
>
> > From: Allison, Timothy B.
> > Sent: April 22, 2015 5:47:17am PDT
> > To: dev@tika.apache.org
> > Subject: comparing Tika's file detect with other tools?
> >
> > Would it be frowned upon to compare Tika's file detection with other
> tools, like "file"?  Any concerns about effectively reverse engineering
> (when we find that Tika is wrong) from a non-Apache project?
> >
> > Any other sensitivities I should be aware of?
> >
> > Best,
> >
> >              Tim
>
>
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
>
>
>
>
>
>

RE: comparing Tika's file detect with other tools?

Posted by "Allison, Timothy B." <ta...@mitre.org>.
Oops, our emails passed in the ether.  Thank you, Jukka!

-----Original Message-----
From: Jukka Zitting [mailto:jukka.zitting@gmail.com] 
Sent: Wednesday, April 22, 2015 12:06 PM
To: dev@tika.apache.org
Subject: Re: comparing Tika's file detect with other tools?

Hi,

Copyright also covers databases, so we'll need to honor the license
terms equally when copying file's code or detection patterns. Luckily
file (from http://www.darwinsys.com/file/) comes under a BSD license,
so reusing the code or data is quite simple from a licensing
perspective. In fact we've already done some of that earlier, see
https://github.com/apache/tika/commit/f807af0ee947affd34d84b334bbdc32c11576b2e
for an example.

BR,

Jukka

Re: comparing Tika's file detect with other tools?

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

Copyright also covers databases, so we'll need to honor the license
terms equally when copying file's code or detection patterns. Luckily
file (from http://www.darwinsys.com/file/) comes under a BSD license,
so reusing the code or data is quite simple from a licensing
perspective. In fact we've already done some of that earlier, see
https://github.com/apache/tika/commit/f807af0ee947affd34d84b334bbdc32c11576b2e
for an example.

BR,

Jukka

Re: comparing Tika's file detect with other tools?

Posted by Tyler Palsulich <tp...@gmail.com>.
Hi Tim,

I do not know about if there would be licensing concerns. But, we do have
TIKA-289 to track merging magic bytes from `file` into Tika.

Tyler

On Wed, Apr 22, 2015 at 10:40 AM, Ken Krugler <kk...@transpac.com>
wrote:

> Hi Tim,
>
> I don't believe there's any issue with comparing results.
>
> If you were looking at the source for "file", then it gets more gray, but
> I think even that would be OK as long as you weren't copying code or
> directly re-implementing algorithms.
>
> -- Ken
>
> > From: Allison, Timothy B.
> > Sent: April 22, 2015 5:47:17am PDT
> > To: dev@tika.apache.org
> > Subject: comparing Tika's file detect with other tools?
> >
> > Would it be frowned upon to compare Tika's file detection with other
> tools, like "file"?  Any concerns about effectively reverse engineering
> (when we find that Tika is wrong) from a non-Apache project?
> >
> > Any other sensitivities I should be aware of?
> >
> > Best,
> >
> >              Tim
>
>
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
>
>
>
>
>
>

RE: comparing Tika's file detect with other tools?

Posted by Ken Krugler <kk...@transpac.com>.
Hi Tim,

I don't believe there's any issue with comparing results.

If you were looking at the source for "file", then it gets more gray, but I think even that would be OK as long as you weren't copying code or directly re-implementing algorithms.

-- Ken

> From: Allison, Timothy B.
> Sent: April 22, 2015 5:47:17am PDT
> To: dev@tika.apache.org
> Subject: comparing Tika's file detect with other tools?
> 
> Would it be frowned upon to compare Tika's file detection with other tools, like "file"?  Any concerns about effectively reverse engineering (when we find that Tika is wrong) from a non-Apache project?
> 
> Any other sensitivities I should be aware of?
> 
> Best,
> 
>              Tim


--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr