You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tyler Palsulich (JIRA)" <ji...@apache.org> on 2015/03/03 03:03:04 UTC

[jira] [Closed] (TIKA-993) Language Detection Fault

     [ https://issues.apache.org/jira/browse/TIKA-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tyler Palsulich closed TIKA-993.
--------------------------------
    Resolution: Cannot Reproduce

This issue is >2 years old and has no attachment for the text. So, I'm closing as Cannot Reproduce. If you still have the text, please reopen!

> Language Detection Fault
> ------------------------
>
>                 Key: TIKA-993
>                 URL: https://issues.apache.org/jira/browse/TIKA-993
>             Project: Tika
>          Issue Type: Bug
>          Components: languageidentifier
>            Reporter: Iman Reihanian
>         Attachments: DetectorImpl.java
>
>
> This text's language is English but it detects as Italy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [jira] [Closed] (TIKA-993) Language Detection Fault

Posted by Oleg Tikhonov <ol...@gmail.com>.
The first found. In this case will be German. Expexted result - a topic to
discuss. I would expect to get both detected languages. However it is
beyond tika's lang.dect.

Bottom line, so be it as is until Ken's implementation.
On 3 Mar 2015 09:09, "Tyler Palsulich" <tp...@gmail.com> wrote:

> Hi,
>
> What do you mean, the detection is faulty? What is the expected result in
> that case?
>
> Thanks,
> Tyler
> On Mar 3, 2015 1:10 AM, "Oleg Tikhonov" <ol...@apache.org> wrote:
>
> > Hi,
> > Just for the record ...
> > It can happen if a file contains context that at least written in two
> > different languages. For instance, the first half of file, say, is a
> German
> > and the second one, say ... a French. In such case detection would be
> > faulty.
> >
> > Br,
> > Oleg
> > On 3 Mar 2015 04:03, "Tyler Palsulich (JIRA)" <ji...@apache.org> wrote:
> >
> > >
> > >      [
> > >
> >
> https://issues.apache.org/jira/browse/TIKA-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> > > ]
> > >
> > > Tyler Palsulich closed TIKA-993.
> > > --------------------------------
> > >     Resolution: Cannot Reproduce
> > >
> > > This issue is >2 years old and has no attachment for the text. So, I'm
> > > closing as Cannot Reproduce. If you still have the text, please reopen!
> > >
> > > > Language Detection Fault
> > > > ------------------------
> > > >
> > > >                 Key: TIKA-993
> > > >                 URL: https://issues.apache.org/jira/browse/TIKA-993
> > > >             Project: Tika
> > > >          Issue Type: Bug
> > > >          Components: languageidentifier
> > > >            Reporter: Iman Reihanian
> > > >         Attachments: DetectorImpl.java
> > > >
> > > >
> > > > This text's language is English but it detects as Italy.
> > >
> > >
> > >
> > > --
> > > This message was sent by Atlassian JIRA
> > > (v6.3.4#6332)
> > >
> >
>

Re: [jira] [Closed] (TIKA-993) Language Detection Fault

Posted by Tyler Palsulich <tp...@gmail.com>.
Hi,

What do you mean, the detection is faulty? What is the expected result in
that case?

Thanks,
Tyler
On Mar 3, 2015 1:10 AM, "Oleg Tikhonov" <ol...@apache.org> wrote:

> Hi,
> Just for the record ...
> It can happen if a file contains context that at least written in two
> different languages. For instance, the first half of file, say, is a German
> and the second one, say ... a French. In such case detection would be
> faulty.
>
> Br,
> Oleg
> On 3 Mar 2015 04:03, "Tyler Palsulich (JIRA)" <ji...@apache.org> wrote:
>
> >
> >      [
> >
> https://issues.apache.org/jira/browse/TIKA-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> > ]
> >
> > Tyler Palsulich closed TIKA-993.
> > --------------------------------
> >     Resolution: Cannot Reproduce
> >
> > This issue is >2 years old and has no attachment for the text. So, I'm
> > closing as Cannot Reproduce. If you still have the text, please reopen!
> >
> > > Language Detection Fault
> > > ------------------------
> > >
> > >                 Key: TIKA-993
> > >                 URL: https://issues.apache.org/jira/browse/TIKA-993
> > >             Project: Tika
> > >          Issue Type: Bug
> > >          Components: languageidentifier
> > >            Reporter: Iman Reihanian
> > >         Attachments: DetectorImpl.java
> > >
> > >
> > > This text's language is English but it detects as Italy.
> >
> >
> >
> > --
> > This message was sent by Atlassian JIRA
> > (v6.3.4#6332)
> >
>

Re: [jira] [Closed] (TIKA-993) Language Detection Fault

Posted by Oleg Tikhonov <ol...@apache.org>.
Hi,
Just for the record ...
It can happen if a file contains context that at least written in two
different languages. For instance, the first half of file, say, is a German
and the second one, say ... a French. In such case detection would be
faulty.

Br,
Oleg
On 3 Mar 2015 04:03, "Tyler Palsulich (JIRA)" <ji...@apache.org> wrote:

>
>      [
> https://issues.apache.org/jira/browse/TIKA-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> ]
>
> Tyler Palsulich closed TIKA-993.
> --------------------------------
>     Resolution: Cannot Reproduce
>
> This issue is >2 years old and has no attachment for the text. So, I'm
> closing as Cannot Reproduce. If you still have the text, please reopen!
>
> > Language Detection Fault
> > ------------------------
> >
> >                 Key: TIKA-993
> >                 URL: https://issues.apache.org/jira/browse/TIKA-993
> >             Project: Tika
> >          Issue Type: Bug
> >          Components: languageidentifier
> >            Reporter: Iman Reihanian
> >         Attachments: DetectorImpl.java
> >
> >
> > This text's language is English but it detects as Italy.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>