You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by Apache Wiki <wi...@apache.org> on 2015/11/26 08:12:19 UTC

[Tika Wiki] Update of "AdityaDhulipala" by ChrisMattmann

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.

The "AdityaDhulipala" page has been changed by ChrisMattmann:
https://wiki.apache.org/tika/AdityaDhulipala?action=diff&rev1=2&rev2=3

  
  === Pooled Time Series parser for Tika ===
  
- I'm working on integrating Dr. Ryoo's research work into Tika
+ I'm working on [[PooledTimeSeriesParser|integrating Dr. Ryoo's research work into Tika]].
- [[http://michaelryoo.com/jpl-interaction.html]]
- [[https://github.com/chrismattmann/pooled_time_series]]
- [[http://arxiv.org/pdf/1412.6505v2.pdf]]
- 
- ==== Metadata Representation ====
- 
- The ultimate goal of the project is to be able to extract metadata from videos and index it inside Solr.
- 
- Videos, like images, are just numbers - or a ordered sequence of number - or matrices.
- 
- There are many ways in which these numbers can be defined.
- Some popular visual descriptors are Histogram of Gradients, Optical Flow vectors, RGB or Color Histograms.
- The challenge is to figure out a way to map this datatype to a datatype that can be understood by Solr.
- 
- In the case of color based histograms, we can convert the image into a matrix of hex values, where each hex value is the pixel color value
- and index that as a text_ws field in Solr.
- 
- This is what ShutterStock did with respect to an image search tool they've built
- https://lucidworks.com/blog/shutterstock-searches-35-million-images-color-using-apache-solr/
- 
- Another idea I was thinking of was to index the data as a XHTML document of table values,
- 
- where each <tr>..</tr> would be a row of the feature matrix and <td> would be the corresponding element in that column.
- 
- However, while performing ranking or querying we would have to compute a distance function on these values (for the dataset and the query video)
- 
- How have other users solved this problem? There must be instances of matrix type data showing up in other domains, 
- such as geography, physics and other scientific domains. How is the metadata designed in such cases?
- 
- 
  
  ----
  CategoryHomepage