Overview

While investigating Archivematica, there were aspects of File Information Tool Set (FITS) behavior I wanted to verify, so I tried it using Docker. This is a memo of that process.

Installation

The installation method using Docker is described at the following page.

However, when accessing the following page mentioned in the manual, the latest release (1.6.0) that includes the Dockerfile could not be downloaded.

Instead, the latest zip file could be downloaded from the following GitHub releases page.

After that, I extracted and built it according to the README instructions.

However, on M1 Mac, executing the steps as described resulted in the following error.

% docker run --rm -v `pwd`:/work fits -i fits.sh
2024-01-26 11:41:10 - ERROR - MediaInfo:95 - Error loading native library for this operating system for tool: MediaInfo. ostype=[Linux] -- jvmModel=[64] -- nativeLibPath=[/opt/fits/tools/mediainfo/linux] -- No native MediaInfo library for this OS
java.lang.UnsatisfiedLinkError: Unable to load library 'mediainfo':
libmediainfo.so: cannot open shared object file: No such file or directory
libmediainfo.so: cannot open shared object file: No such file or directory
/opt/fits/tools/mediainfo/linux/libmediainfo.so.0: cannot open shared object file: No such file or directory
...

After consulting ChatGPT 4 about this, it instructed me to add the following to the Dockerfile.

RUN apt-get update && \
    apt-get install -yqq \
    # Other dependencies
    mediainfo libmediainfo-dev \
    && rm -rf /var/lib/apt/lists/*

After adding the above, it worked correctly.

Trying It Out

This time, since I wanted to target a file with Japanese in the filename, I used "A Very Understandable Guide to Copyright and Classes.pdf" (Hiroshima University Information Media Education Research Center), which is published online under a CC BY license.

https://www.media.hiroshima-u.ac.jp/wp-content/uploads/2023/05/ใ™ใ”ใใ‚ใ‹ใ‚‹่‘—ไฝœๆจฉใจๆŽˆๆฅญ.pdf

Then, I executed the following.

docker run --rm -v `pwd`:/work fits -i ใ™ใ”ใใ‚ใ‹ใ‚‹่‘—ไฝœๆจฉใจๆŽˆๆฅญ.pdf

As a result, the following output was obtained.

<?xml version="1.0" encoding="UTF-8"?>
<fits xmlns="http://hul.harvard.edu/ois/xml/ns/fits/fits_output" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://hul.harvard.edu/ois/xml/ns/fits/fits_output http://hul.harvard.edu/ois/xml/xsd/fits/fits_output.xsd" version="1.6.0" timestamp="1/26/24, 12:49 PM">
  <identification>
    <identity format="PDF/X" mimetype="application/pdf" toolname="FITS" toolversion="1.6.0">
      <tool toolname="Droid" toolversion="6.5.2" />
      <tool toolname="Exiftool" toolversion="12.50" />
      <tool toolname="Tika" toolversion="2.6.0" />
      <version toolname="Tika" toolversion="2.6.0">PDF/X-4</version>
      <externalIdentifier toolname="Droid" toolversion="6.5.2" type="puid">fmt/488</externalIdentifier>
    </identity>
  </identification>
  <fileinfo>
    <size toolname="Jhove" toolversion="1.26.1">13845166</size>
    <creatingApplicationName toolname="Jhove" toolversion="1.26.1">Adobe PDF Library 17.0/Adobe InDesign 18.1 (Macintosh)</creatingApplicationName>
    <lastmodified toolname="Exiftool" toolversion="12.50">2023-01-14T06:28:17Z</lastmodified>
    <created toolname="Exiftool" toolversion="12.50">2023-01-14T05:31:26Z</created>
    <filepath toolname="OIS File Information" toolversion="1.0" status="SINGLE_RESULT">/work/ใ™ใ”ใใ‚ใ‹ใ‚‹่‘—ไฝœๆจฉใจๆŽˆๆฅญ.pdf</filepath>
    <filename toolname="OIS File Information" toolversion="1.0" status="SINGLE_RESULT">ใ™ใ”ใใ‚ใ‹ใ‚‹่‘—ไฝœๆจฉใจๆŽˆๆฅญ.pdf</filename>
    <md5checksum toolname="OIS File Information" toolversion="1.0" status="SINGLE_RESULT">1f6a11a1b23607a0e29e10efdc153584</md5checksum>
    <fslastmodified toolname="OIS File Information" toolversion="1.0" status="SINGLE_RESULT">1684976952000</fslastmodified>
  </fileinfo>
  <filestatus>
    <well-formed toolname="Jhove" toolversion="1.26.1" status="SINGLE_RESULT">true</well-formed>
    <valid toolname="Jhove" toolversion="1.26.1" status="SINGLE_RESULT">true</valid>
  </filestatus>
  <metadata>
    <document>
      <title toolname="Jhove" toolversion="1.26.1">ใ™ใ”ใใ‚ใ‹ใ‚‹่‘—ไฝœๆจฉใจๆŽˆๆฅญ_PDF็‰ˆ.indd</title>
      <author toolname="Exiftool" toolversion="12.50" status="SINGLE_RESULT">Adobe InDesign 18.1 (Macintosh)</author>
      <language toolname="Jhove" toolversion="1.26.1">ja-JP</language>
      <pageCount toolname="Jhove" toolversion="1.26.1">64</pageCount>
      <hasOutline toolname="Jhove" toolversion="1.26.1" status="SINGLE_RESULT">no</hasOutline>
      <hasAnnotations toolname="Jhove" toolversion="1.26.1" status="SINGLE_RESULT">yes</hasAnnotations>
      <graphicsCount toolname="Jhove" toolversion="1.26.1" status="SINGLE_RESULT">7</graphicsCount>
      <font toolname="Jhove" toolversion="1.26.1" status="SINGLE_RESULT">
        <fontName>DINNextLTPro-Medium</fontName>
      </font>
      <font toolname="Jhove" toolversion="1.26.1" status="SINGLE_RESULT">
        <fontName>DINNextRoundedLTPro-Regular</fontName>
      </font>
      <font toolname="Jhove" toolversion="1.26.1" status="SINGLE_RESULT">
        <fontName>GaramondPremrPro</fontName>
      </font>
      <font toolname="Jhove" toolversion="1.26.1" status="SINGLE_RESULT">
        <fontName>GaramondPremrPro-Smbd</fontName>
      </font>
      <font toolname="Jhove" toolversion="1.26.1" status="SINGLE_RESULT">
        <fontName>GothicMB101Pro-DeBold</fontName>
      </font>
      <font toolname="Jhove" toolversion="1.26.1" status="SINGLE_RESULT">
        <fontName>HiraginoUDSansFStdN-W3</fontName>
      </font>
      <font toolname="Jhove" toolversion="1.26.1" status="SINGLE_RESULT">
        <fontName>HiraginoUDSansFStdN-W4</fontName>
      </font>
      <font toolname="Jhove" toolversion="1.26.1" status="SINGLE_RESULT">
        <fontName>HiraginoUDSansFStdN-W5</fontName>
      </font>
      <font toolname="Jhove" toolversion="1.26.1" status="SINGLE_RESULT">
        <fontName>HiraginoUDSansFStdN-W6</fontName>
      </font>
      <font toolname="Jhove" toolversion="1.26.1" status="SINGLE_RESULT">
        <fontName>HiraginoUDSansRStdN-W4</fontName>
      </font>
      <font toolname="Jhove" toolversion="1.26.1" status="SINGLE_RESULT">
        <fontName>HiraginoUDSansRStdN-W5</fontName>
      </font>
      <font toolname="Jhove" toolversion="1.26.1" status="SINGLE_RESULT">
        <fontName>HiraginoUDSansRStdN-W6</fontName>
      </font>
      <font toolname="Jhove" toolversion="1.26.1" status="SINGLE_RESULT">
        <fontName>HiraginoUDSansStd-W3</fontName>
      </font>
      <font toolname="Jhove" toolversion="1.26.1" status="SINGLE_RESULT">
        <fontName>ReimPro-ExBold</fontName>
      </font>
      <font toolname="Jhove" toolversion="1.26.1" status="SINGLE_RESULT">
        <fontName>RyuminPro-ExBold</fontName>
      </font>
      <font toolname="Jhove" toolversion="1.26.1" status="SINGLE_RESULT">
        <fontName>RyuminPro-Medium</fontName>
      </font>
      <font toolname="Jhove" toolversion="1.26.1" status="SINGLE_RESULT">
        <fontName>ShueiMGoStd-B</fontName>
      </font>
    </document>
  </metadata>
  <statistics fitsExecutionTime="2256">
    <tool toolname="MediaInfo" toolversion="22.09" status="did not run" />
    <tool toolname="OIS Audio Information" toolversion="0.1" status="did not run" />
    <tool toolname="ADL Tool" toolversion="0.1" status="did not run" />
    <tool toolname="VTT Tool" toolversion="0.1" status="did not run" />
    <tool toolname="Droid" toolversion="6.5.2" executionTime="299" />
    <tool toolname="jpylyzer" toolversion="2.1.0" status="did not run" />
    <tool toolname="Jhove" toolversion="1.26.1" executionTime="1201" />
    <tool toolname="embARC" toolversion="0.2" status="did not run" />
    <tool toolname="file utility" toolversion="5.43" executionTime="963" />
    <tool toolname="Exiftool" toolversion="12.50" executionTime="983" />
    <tool toolname="NLNZ Metadata Extractor" toolversion="3.6GA" status="did not run" />
    <tool toolname="OIS File Information" toolversion="1.0" executionTime="231" />
    <tool toolname="OIS XML Metadata" toolversion="0.2" status="did not run" />
    <tool toolname="ffident" toolversion="0.2" executionTime="738" />
    <tool toolname="Tika" toolversion="2.6.0" executionTime="2137" />
  </statistics>
</fits>

The following is an explanation by ChatGPT 4.


The provided XML is an example of output generated by FITS (File Information Tool Set). FITS is a tool used to identify and report characteristics and metadata of digital files. This XML output shows the analysis results of a specific PDF file. Each section is explained below:

<identification> Section

  • Contains information about the file format, MIME type (in this case application/pdf), and the tools used to identify it (Droid, Exiftool, Tika).
  • The version tag indicates the PDF file version (PDF/X-4).
  • The externalIdentifier tag shows the PUID (Persistent Unique Identifier) of the file format identified by the Droid tool.

<fileinfo> Section

  • Contains basic information about the file, such as file size, creating application, last modification date, creation date, file path, filename, MD5 checksum, and filesystem last modification date.

<filestatus> Section

  • Indicates whether the file format is well-formed and valid. This shows whether the file conforms to standards.

<metadata> Section

  • Contains metadata about the document. For example, information such as title, author, language, page count, presence of outlines, presence of annotations, number of images, and fonts used.

<statistics> Section

  • Shows the execution time and status of each tool. This reveals how long each tool took to analyze the file.

This output presents a detailed analysis of a PDF file and provides important information for the fields of archiving, libraries, and digital preservation. FITS is used to integrate the results of various tools to provide a comprehensive view of file characteristics.


Summary

Using FITS, I was able to learn various information about digital files. I plan to continue investigating for further utilization of FITS.