You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@plc4x.apache.org by Julian Feinauer <j....@pragmaticminds.de> on 2021/02/23 07:59:38 UTC

mspec vs Apache Tika

Hi friends!

I recently had some discussions about PLC4X for different types of protocols and languages (esp. C#).

So I wanted to “rewarm” an old discussion (and also point it out for newer community members.

As you all know we have this m x n problem (where m being the protocols we support, while n is the languages we support). Some time ago we decided to go the way of code generation to tackle this. Chris made incredible efforts and came up with mspec which is, I would say today, the new core of PLC4X and one of its biggest assets as its just so… cool.

Another alternative we discussed a bit was the idea that Apache Tika [1] went. Basically to have one interface and behind this interface we integrate all kind of tools to get the job done. E.g. also third party libs, probably also adapters for proprietay stuff and so on.
One concrete example for this is the OPC UA protocol. The current Implementation done by Matthias is basically an adapter to Eclipse Milo (or a differently named project of Kevin) which only does “mediation” between the full OPC UA Stack and the PLC4X Frontend.
The other way is currently engineered mostly by Ben who works on integrating OPC UA via mspec that it is a PLC4X “native” driver.

There already were some discussions what would be better and I’m totally unemotional about it but I also see value in the “adapter” approach. For me, PLC4Xs main benefit is the unified API we can provide that makes it awesome.

And, the discussion doesn’t need to be an either or it can also be both. I agree that a well written native driver always beats “external” ones. But from a time and material perspective it is usually easier (as a first step) to integrate existing work into the project to get it out there. And we can also mark that on the homepage or elsewhere as “native” or “non-native”.

And regarding our n-Languages Problem there already was the idea to come up with a Java based Web Server and other language Bindings e.g. via Thrift, Protobuf or gRPC (which where thrown out due to the build complexity they brought). But If someone (looking at you Josh) is willing to experiment a bit with it, I would definetly be in : )

What are your thoughs on these matters?

Best
Julian

[1] https://tika.apache.org/

AW: mspec vs Apache Tika

Posted by Christofer Dutz <ch...@c-ware.de>.
Hi Juilian,

I really think the plc4x-mspec protocol would really eliminate a lot of problems.

I am planning on fnishing this in the next weeks as I have till 31.03. to finish this work as part of my EU project.
And with that we'll setup a clean Java Server which we then can plug in other language APIs petty easily.


Chris

-----Ursprüngliche Nachricht-----
Von: Julian Feinauer <j....@pragmaticminds.de> 
Gesendet: Dienstag, 23. Februar 2021 14:01
An: dev@plc4x.apache.org
Betreff: AW: mspec vs Apache Tika

Hey,

well this is no clear proposal but just starting discussing things that I'm currently thinking about : ) As Josh wanted to do a bit on C# we had the idea to "rewarm" the old "java plc server" idea to play around with.

But I agree, with your plc4x protocol would be ready we could also use that as layer between. Then we would get even non-POSIX compatibility. I'm totally open here : )

Julian

Von: Christofer Dutz <ch...@c-ware.de>
Datum: Dienstag, 23. Februar 2021 um 11:17
An: dev@plc4x.apache.org <de...@plc4x.apache.org>
Betreff: AW: mspec vs Apache Tika
Hi Julian,

isn't the implementing a native API exactly what we are doing right now?

Also I think I don't quite seem to understand what you are actually proposing ... we use mspec to generate the code for model serializer and parser in a given language ... that's a part the user actually never has contact with.

The only part where I'm currently working on a different usage for mspec is where I'm tryging to replace what we had planned with Thrift and then gRPC with a custom PLC4X internal protocol. The main reason is that for none of the other approaches there's actually C support. So this might be something that I'll only use to communicate with C PLC4X agents.

But I don't think you are referring to that.

Chris


-----Ursprüngliche Nachricht-----
Von: Julian Feinauer <j....@pragmaticminds.de>
Gesendet: Dienstag, 23. Februar 2021 10:56
An: dev@plc4x.apache.org
Betreff: AW: mspec vs Apache Tika

Hey,

I totally agree with you that the "one api" is something language specific (that's e.g. the way we currently do it in Apache IoTDB).

So in the case of a "Language Server" there would be the need to write a "native" API for every language because the default would be something like (in pseudocode)

Request {
item[]
}

That would be send via RPC. But the "API" for the user should of course be language specific and look like what you describe here.

But as I said, I don't see this as an either / or discussion but rather as a way to bring most features "easily" and "quickly" to all other protocols at a cost (the cost is the need to run a Java based Server somewhere).
This does not conflict with the current ("native") drivers that are really "embeddebale" in such a project and often the better solution.

WDYT?

Julian

Von: Christofer Dutz <ch...@c-ware.de>
Datum: Dienstag, 23. Februar 2021 um 10:17
An: dev@plc4x.apache.org <de...@plc4x.apache.org>
Betreff: AW: mspec vs Apache Tika
Hi Julian,

having worked on the PLC4J, PLC4Go, PLC4C, PLC4Net and PLC4Cpp now, I think I can add more insights to this discussion.

The main problem is the availability of concepts in different languages. Right now for example we couldn't do the cool chaining of stuff we use in the PLC4J API.

        PlcReadRequest readRequest = plcConnection.readRequestBuilder().addItem("item1", "lalala").addItem("item2", "hurz").build()

Go simply doesn't support this. Here this has to look differently:

        rrb := connection.ReadRequestBuilder()
        rrb.AddItem("item1", "lalala")
        rrb.AddItem("item2", "hurz")
        readRequest, err := rrb.Build()

And in C again none of these would work that way (But I woulnd't let C block anything)

This "one api" approach would probably only work, if we reduced the API on the simplest possible variant, throwing overboard the language specific benefits.

        - Builder Pattern in Java (And this is just one cool example)
        - Service discovery in Java
        - Go Routines and channels in Go
        - Size of everything in C

Just my input to the discussion.


Chris




-----Ursprüngliche Nachricht-----
Von: Julian Feinauer <j....@pragmaticminds.de>
Gesendet: Dienstag, 23. Februar 2021 09:00
An: dev@plc4x.apache.org
Betreff: mspec vs Apache Tika

Hi friends!

I recently had some discussions about PLC4X for different types of protocols and languages (esp. C#).

So I wanted to "rewarm" an old discussion (and also point it out for newer community members.

As you all know we have this m x n problem (where m being the protocols we support, while n is the languages we support). Some time ago we decided to go the way of code generation to tackle this. Chris made incredible efforts and came up with mspec which is, I would say today, the new core of PLC4X and one of its biggest assets as its just so. cool.

Another alternative we discussed a bit was the idea that Apache Tika [1] went. Basically to have one interface and behind this interface we integrate all kind of tools to get the job done. E.g. also third party libs, probably also adapters for proprietay stuff and so on.
One concrete example for this is the OPC UA protocol. The current Implementation done by Matthias is basically an adapter to Eclipse Milo (or a differently named project of Kevin) which only does "mediation" between the full OPC UA Stack and the PLC4X Frontend.
The other way is currently engineered mostly by Ben who works on integrating OPC UA via mspec that it is a PLC4X "native" driver.

There already were some discussions what would be better and I'm totally unemotional about it but I also see value in the "adapter" approach. For me, PLC4Xs main benefit is the unified API we can provide that makes it awesome.

And, the discussion doesn't need to be an either or it can also be both. I agree that a well written native driver always beats "external" ones. But from a time and material perspective it is usually easier (as a first step) to integrate existing work into the project to get it out there. And we can also mark that on the homepage or elsewhere as "native" or "non-native".

And regarding our n-Languages Problem there already was the idea to come up with a Java based Web Server and other language Bindings e.g. via Thrift, Protobuf or gRPC (which where thrown out due to the build complexity they brought). But If someone (looking at you Josh) is willing to experiment a bit with it, I would definetly be in : )

What are your thoughs on these matters?

Best
Julian

[1] https://tika.apache.org/

AW: mspec vs Apache Tika

Posted by Julian Feinauer <j....@pragmaticminds.de>.
Hey,

well this is no clear proposal but just starting discussing things that I’m currently thinking about : )
As Josh wanted to do a bit on C# we had the idea to “rewarm” the old “java plc server” idea to play around with.

But I agree, with your plc4x protocol would be ready we could also use that as layer between. Then we would get even non-POSIX compatibility. I’m totally open here : )

Julian

Von: Christofer Dutz <ch...@c-ware.de>
Datum: Dienstag, 23. Februar 2021 um 11:17
An: dev@plc4x.apache.org <de...@plc4x.apache.org>
Betreff: AW: mspec vs Apache Tika
Hi Julian,

isn't the implementing a native API exactly what we are doing right now?

Also I think I don't quite seem to understand what you are actually proposing ... we use mspec to generate the code for model serializer and parser in a given language ... that's a part the user actually never has contact with.

The only part where I'm currently working on a different usage for mspec is where I'm tryging to replace what we had planned with Thrift and then gRPC with a custom PLC4X internal protocol. The main reason is that for none of the other approaches there's actually C support. So this might be something that I'll only use to communicate with C PLC4X agents.

But I don't think you are referring to that.

Chris


-----Ursprüngliche Nachricht-----
Von: Julian Feinauer <j....@pragmaticminds.de>
Gesendet: Dienstag, 23. Februar 2021 10:56
An: dev@plc4x.apache.org
Betreff: AW: mspec vs Apache Tika

Hey,

I totally agree with you that the "one api" is something language specific (that's e.g. the way we currently do it in Apache IoTDB).

So in the case of a "Language Server" there would be the need to write a "native" API for every language because the default would be something like (in pseudocode)

Request {
item[]
}

That would be send via RPC. But the "API" for the user should of course be language specific and look like what you describe here.

But as I said, I don't see this as an either / or discussion but rather as a way to bring most features "easily" and "quickly" to all other protocols at a cost (the cost is the need to run a Java based Server somewhere).
This does not conflict with the current ("native") drivers that are really "embeddebale" in such a project and often the better solution.

WDYT?

Julian

Von: Christofer Dutz <ch...@c-ware.de>
Datum: Dienstag, 23. Februar 2021 um 10:17
An: dev@plc4x.apache.org <de...@plc4x.apache.org>
Betreff: AW: mspec vs Apache Tika
Hi Julian,

having worked on the PLC4J, PLC4Go, PLC4C, PLC4Net and PLC4Cpp now, I think I can add more insights to this discussion.

The main problem is the availability of concepts in different languages. Right now for example we couldn't do the cool chaining of stuff we use in the PLC4J API.

        PlcReadRequest readRequest = plcConnection.readRequestBuilder().addItem("item1", "lalala").addItem("item2", "hurz").build()

Go simply doesn't support this. Here this has to look differently:

        rrb := connection.ReadRequestBuilder()
        rrb.AddItem("item1", "lalala")
        rrb.AddItem("item2", "hurz")
        readRequest, err := rrb.Build()

And in C again none of these would work that way (But I woulnd't let C block anything)

This "one api" approach would probably only work, if we reduced the API on the simplest possible variant, throwing overboard the language specific benefits.

        - Builder Pattern in Java (And this is just one cool example)
        - Service discovery in Java
        - Go Routines and channels in Go
        - Size of everything in C

Just my input to the discussion.


Chris




-----Ursprüngliche Nachricht-----
Von: Julian Feinauer <j....@pragmaticminds.de>
Gesendet: Dienstag, 23. Februar 2021 09:00
An: dev@plc4x.apache.org
Betreff: mspec vs Apache Tika

Hi friends!

I recently had some discussions about PLC4X for different types of protocols and languages (esp. C#).

So I wanted to "rewarm" an old discussion (and also point it out for newer community members.

As you all know we have this m x n problem (where m being the protocols we support, while n is the languages we support). Some time ago we decided to go the way of code generation to tackle this. Chris made incredible efforts and came up with mspec which is, I would say today, the new core of PLC4X and one of its biggest assets as its just so. cool.

Another alternative we discussed a bit was the idea that Apache Tika [1] went. Basically to have one interface and behind this interface we integrate all kind of tools to get the job done. E.g. also third party libs, probably also adapters for proprietay stuff and so on.
One concrete example for this is the OPC UA protocol. The current Implementation done by Matthias is basically an adapter to Eclipse Milo (or a differently named project of Kevin) which only does "mediation" between the full OPC UA Stack and the PLC4X Frontend.
The other way is currently engineered mostly by Ben who works on integrating OPC UA via mspec that it is a PLC4X "native" driver.

There already were some discussions what would be better and I'm totally unemotional about it but I also see value in the "adapter" approach. For me, PLC4Xs main benefit is the unified API we can provide that makes it awesome.

And, the discussion doesn't need to be an either or it can also be both. I agree that a well written native driver always beats "external" ones. But from a time and material perspective it is usually easier (as a first step) to integrate existing work into the project to get it out there. And we can also mark that on the homepage or elsewhere as "native" or "non-native".

And regarding our n-Languages Problem there already was the idea to come up with a Java based Web Server and other language Bindings e.g. via Thrift, Protobuf or gRPC (which where thrown out due to the build complexity they brought). But If someone (looking at you Josh) is willing to experiment a bit with it, I would definetly be in : )

What are your thoughs on these matters?

Best
Julian

[1] https://tika.apache.org/

AW: mspec vs Apache Tika

Posted by Christofer Dutz <ch...@c-ware.de>.
Hi Julian,

isn't the implementing a native API exactly what we are doing right now?

Also I think I don't quite seem to understand what you are actually proposing ... we use mspec to generate the code for model serializer and parser in a given language ... that's a part the user actually never has contact with.

The only part where I'm currently working on a different usage for mspec is where I'm tryging to replace what we had planned with Thrift and then gRPC with a custom PLC4X internal protocol. The main reason is that for none of the other approaches there's actually C support. So this might be something that I'll only use to communicate with C PLC4X agents.

But I don't think you are referring to that.

Chris


-----Ursprüngliche Nachricht-----
Von: Julian Feinauer <j....@pragmaticminds.de> 
Gesendet: Dienstag, 23. Februar 2021 10:56
An: dev@plc4x.apache.org
Betreff: AW: mspec vs Apache Tika

Hey,

I totally agree with you that the "one api" is something language specific (that's e.g. the way we currently do it in Apache IoTDB).

So in the case of a "Language Server" there would be the need to write a "native" API for every language because the default would be something like (in pseudocode)

Request {
item[]
}

That would be send via RPC. But the "API" for the user should of course be language specific and look like what you describe here.

But as I said, I don't see this as an either / or discussion but rather as a way to bring most features "easily" and "quickly" to all other protocols at a cost (the cost is the need to run a Java based Server somewhere).
This does not conflict with the current ("native") drivers that are really "embeddebale" in such a project and often the better solution.

WDYT?

Julian

Von: Christofer Dutz <ch...@c-ware.de>
Datum: Dienstag, 23. Februar 2021 um 10:17
An: dev@plc4x.apache.org <de...@plc4x.apache.org>
Betreff: AW: mspec vs Apache Tika
Hi Julian,

having worked on the PLC4J, PLC4Go, PLC4C, PLC4Net and PLC4Cpp now, I think I can add more insights to this discussion.

The main problem is the availability of concepts in different languages. Right now for example we couldn't do the cool chaining of stuff we use in the PLC4J API.

        PlcReadRequest readRequest = plcConnection.readRequestBuilder().addItem("item1", "lalala").addItem("item2", "hurz").build()

Go simply doesn't support this. Here this has to look differently:

        rrb := connection.ReadRequestBuilder()
        rrb.AddItem("item1", "lalala")
        rrb.AddItem("item2", "hurz")
        readRequest, err := rrb.Build()

And in C again none of these would work that way (But I woulnd't let C block anything)

This "one api" approach would probably only work, if we reduced the API on the simplest possible variant, throwing overboard the language specific benefits.

        - Builder Pattern in Java (And this is just one cool example)
        - Service discovery in Java
        - Go Routines and channels in Go
        - Size of everything in C

Just my input to the discussion.


Chris




-----Ursprüngliche Nachricht-----
Von: Julian Feinauer <j....@pragmaticminds.de>
Gesendet: Dienstag, 23. Februar 2021 09:00
An: dev@plc4x.apache.org
Betreff: mspec vs Apache Tika

Hi friends!

I recently had some discussions about PLC4X for different types of protocols and languages (esp. C#).

So I wanted to "rewarm" an old discussion (and also point it out for newer community members.

As you all know we have this m x n problem (where m being the protocols we support, while n is the languages we support). Some time ago we decided to go the way of code generation to tackle this. Chris made incredible efforts and came up with mspec which is, I would say today, the new core of PLC4X and one of its biggest assets as its just so. cool.

Another alternative we discussed a bit was the idea that Apache Tika [1] went. Basically to have one interface and behind this interface we integrate all kind of tools to get the job done. E.g. also third party libs, probably also adapters for proprietay stuff and so on.
One concrete example for this is the OPC UA protocol. The current Implementation done by Matthias is basically an adapter to Eclipse Milo (or a differently named project of Kevin) which only does "mediation" between the full OPC UA Stack and the PLC4X Frontend.
The other way is currently engineered mostly by Ben who works on integrating OPC UA via mspec that it is a PLC4X "native" driver.

There already were some discussions what would be better and I'm totally unemotional about it but I also see value in the "adapter" approach. For me, PLC4Xs main benefit is the unified API we can provide that makes it awesome.

And, the discussion doesn't need to be an either or it can also be both. I agree that a well written native driver always beats "external" ones. But from a time and material perspective it is usually easier (as a first step) to integrate existing work into the project to get it out there. And we can also mark that on the homepage or elsewhere as "native" or "non-native".

And regarding our n-Languages Problem there already was the idea to come up with a Java based Web Server and other language Bindings e.g. via Thrift, Protobuf or gRPC (which where thrown out due to the build complexity they brought). But If someone (looking at you Josh) is willing to experiment a bit with it, I would definetly be in : )

What are your thoughs on these matters?

Best
Julian

[1] https://tika.apache.org/

AW: mspec vs Apache Tika

Posted by Julian Feinauer <j....@pragmaticminds.de>.
Hey,

I totally agree with you that the “one api” is something language specific (that’s e.g. the way we currently do it in Apache IoTDB).

So in the case of a “Language Server” there would be the need to write a “native” API for every language because the default would be something like (in pseudocode)

Request {
item[]
}

That would be send via RPC. But the “API” for the user should of course be language specific and look like what you describe here.

But as I said, I don’t see this as an either / or discussion but rather as a way to bring most features “easily” and “quickly” to all other protocols at a cost (the cost is the need to run a Java based Server somewhere).
This does not conflict with the current (“native”) drivers that are really “embeddebale” in such a project and often the better solution.

WDYT?

Julian

Von: Christofer Dutz <ch...@c-ware.de>
Datum: Dienstag, 23. Februar 2021 um 10:17
An: dev@plc4x.apache.org <de...@plc4x.apache.org>
Betreff: AW: mspec vs Apache Tika
Hi Julian,

having worked on the PLC4J, PLC4Go, PLC4C, PLC4Net and PLC4Cpp now, I think I can add more insights to this discussion.

The main problem is the availability of concepts in different languages. Right now for example we couldn't do the cool chaining of stuff we use in the PLC4J API.

        PlcReadRequest readRequest = plcConnection.readRequestBuilder().addItem("item1", "lalala").addItem("item2", "hurz").build()

Go simply doesn't support this. Here this has to look differently:

        rrb := connection.ReadRequestBuilder()
        rrb.AddItem("item1", "lalala")
        rrb.AddItem("item2", "hurz")
        readRequest, err := rrb.Build()

And in C again none of these would work that way (But I woulnd't let C block anything)

This "one api" approach would probably only work, if we reduced the API on the simplest possible variant, throwing overboard the language specific benefits.

        - Builder Pattern in Java (And this is just one cool example)
        - Service discovery in Java
        - Go Routines and channels in Go
        - Size of everything in C

Just my input to the discussion.


Chris




-----Ursprüngliche Nachricht-----
Von: Julian Feinauer <j....@pragmaticminds.de>
Gesendet: Dienstag, 23. Februar 2021 09:00
An: dev@plc4x.apache.org
Betreff: mspec vs Apache Tika

Hi friends!

I recently had some discussions about PLC4X for different types of protocols and languages (esp. C#).

So I wanted to "rewarm" an old discussion (and also point it out for newer community members.

As you all know we have this m x n problem (where m being the protocols we support, while n is the languages we support). Some time ago we decided to go the way of code generation to tackle this. Chris made incredible efforts and came up with mspec which is, I would say today, the new core of PLC4X and one of its biggest assets as its just so. cool.

Another alternative we discussed a bit was the idea that Apache Tika [1] went. Basically to have one interface and behind this interface we integrate all kind of tools to get the job done. E.g. also third party libs, probably also adapters for proprietay stuff and so on.
One concrete example for this is the OPC UA protocol. The current Implementation done by Matthias is basically an adapter to Eclipse Milo (or a differently named project of Kevin) which only does "mediation" between the full OPC UA Stack and the PLC4X Frontend.
The other way is currently engineered mostly by Ben who works on integrating OPC UA via mspec that it is a PLC4X "native" driver.

There already were some discussions what would be better and I'm totally unemotional about it but I also see value in the "adapter" approach. For me, PLC4Xs main benefit is the unified API we can provide that makes it awesome.

And, the discussion doesn't need to be an either or it can also be both. I agree that a well written native driver always beats "external" ones. But from a time and material perspective it is usually easier (as a first step) to integrate existing work into the project to get it out there. And we can also mark that on the homepage or elsewhere as "native" or "non-native".

And regarding our n-Languages Problem there already was the idea to come up with a Java based Web Server and other language Bindings e.g. via Thrift, Protobuf or gRPC (which where thrown out due to the build complexity they brought). But If someone (looking at you Josh) is willing to experiment a bit with it, I would definetly be in : )

What are your thoughs on these matters?

Best
Julian

[1] https://tika.apache.org/

AW: mspec vs Apache Tika

Posted by Christofer Dutz <ch...@c-ware.de>.
Hi Julian,

having worked on the PLC4J, PLC4Go, PLC4C, PLC4Net and PLC4Cpp now, I think I can add more insights to this discussion.

The main problem is the availability of concepts in different languages. Right now for example we couldn't do the cool chaining of stuff we use in the PLC4J API. 

	PlcReadRequest readRequest = plcConnection.readRequestBuilder().addItem("item1", "lalala").addItem("item2", "hurz").build()

Go simply doesn't support this. Here this has to look differently:

	rrb := connection.ReadRequestBuilder()
	rrb.AddItem("item1", "lalala")
	rrb.AddItem("item2", "hurz")
	readRequest, err := rrb.Build()

And in C again none of these would work that way (But I woulnd't let C block anything)

This "one api" approach would probably only work, if we reduced the API on the simplest possible variant, throwing overboard the language specific benefits.

	- Builder Pattern in Java (And this is just one cool example)
	- Service discovery in Java
	- Go Routines and channels in Go
	- Size of everything in C

Just my input to the discussion.


Chris




-----Ursprüngliche Nachricht-----
Von: Julian Feinauer <j....@pragmaticminds.de> 
Gesendet: Dienstag, 23. Februar 2021 09:00
An: dev@plc4x.apache.org
Betreff: mspec vs Apache Tika

Hi friends!

I recently had some discussions about PLC4X for different types of protocols and languages (esp. C#).

So I wanted to "rewarm" an old discussion (and also point it out for newer community members.

As you all know we have this m x n problem (where m being the protocols we support, while n is the languages we support). Some time ago we decided to go the way of code generation to tackle this. Chris made incredible efforts and came up with mspec which is, I would say today, the new core of PLC4X and one of its biggest assets as its just so. cool.

Another alternative we discussed a bit was the idea that Apache Tika [1] went. Basically to have one interface and behind this interface we integrate all kind of tools to get the job done. E.g. also third party libs, probably also adapters for proprietay stuff and so on.
One concrete example for this is the OPC UA protocol. The current Implementation done by Matthias is basically an adapter to Eclipse Milo (or a differently named project of Kevin) which only does "mediation" between the full OPC UA Stack and the PLC4X Frontend.
The other way is currently engineered mostly by Ben who works on integrating OPC UA via mspec that it is a PLC4X "native" driver.

There already were some discussions what would be better and I'm totally unemotional about it but I also see value in the "adapter" approach. For me, PLC4Xs main benefit is the unified API we can provide that makes it awesome.

And, the discussion doesn't need to be an either or it can also be both. I agree that a well written native driver always beats "external" ones. But from a time and material perspective it is usually easier (as a first step) to integrate existing work into the project to get it out there. And we can also mark that on the homepage or elsewhere as "native" or "non-native".

And regarding our n-Languages Problem there already was the idea to come up with a Java based Web Server and other language Bindings e.g. via Thrift, Protobuf or gRPC (which where thrown out due to the build complexity they brought). But If someone (looking at you Josh) is willing to experiment a bit with it, I would definetly be in : )

What are your thoughs on these matters?

Best
Julian

[1] https://tika.apache.org/