The Computer Oracle

Detecting blank image files

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Popsicle Puzzles

--

Chapters
00:00 Detecting Blank Image Files
00:48 Answer 1 Score 5
01:16 Answer 2 Score 2
02:02 Answer 3 Score 1
02:21 Answer 4 Score 1
03:26 Thank you

--

Full question
https://superuser.com/questions/343385/d...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#linux #scanning #imageprocessing #imaging

#avk47



ANSWER 1

Score 5


Use the identify feature of ImageMagik CLI as given here:

http://www.imagemagick.org/script/identify.php

With command:

$ identify -format "%#" source.png

If the number of colors is 1, you have a blank page.

You can also use the command:

identify -verbose source.png

The standard deviation, skew and kurtosis will be 0 for a blank image.




ANSWER 2

Score 2


Slightly improved version of the code in the question:

#!/bin/bash

mkdir -p "blanks"

for i in "$@"; do
    echo "${i}"
    if [[ -e $(dirname "$i")/.$(basename "$i") ]]; then
        echo "   protected."
        continue
    fi

    histogram=$(convert "${i}" -threshold 50% -format %c histogram:info:-)
    #echo $histogram
    white=$(echo "${histogram}" | grep "white" | cut -d: -f1)
    black=$(echo "${histogram}" | grep "black" | cut -d: -f1)
    if [[ -z "$black" ]]; then
        black=0
    fi

    blank=$(echo "scale=4; ${black}/${white} < 0.005" | bc)
    #echo $white $black $blank
    if [ "${blank}" -eq "1" ]; then
        echo "${i} seems to be blank - removing it..."
        mv "${i}" "blanks/${i}"
    fi
done

Changes:

  • Pass the images to check as arguments instead of reading from a fixed location
  • Progress report
  • If the code doesn't detect a file correctly, you can give it a hint (create an empty file with the name of the image plus a dot in front, i.e. to protect a.pnm, use touch .a.pnm)
  • Fixed error when there were no black pixels in the input



ANSWER 3

Score 1


My trick is to scan the images to a losslessly compressed format (tiff + compression). This way, blank pages have a much lower file size and I can detect them with find, move them to another directory, check them quickly with a viewer and then get rid of them.




ANSWER 4

Score 1


You can do a noisy trim with ImageMagick, e.g.:

convert image-0001.png -virtual-pixel White -blur 0x15 -fuzz 15% -trim info:

The page isn't empty if convert prints something like this:

image-0001.png PNG 4565x6129 4960x7016+279+816 8-bit Gray 0.000u 0:00.000

(example input is a 600 dpi DIN A4 scanned lineart image)

It's empty if the height/width after trimming is suspiciously small, e.g.:

image-0001.png PNG 2505x40 4960x7016+0+6976 8-bit Gray 0.000u 0:00.000

In contrast to the threshold histogram method, this produces less false-positives when you have pages that just contain a word or a line of text. With a threshold-histogram, such pages could wrongly be detected as empty.

Looking at the file size of the compressed image, i.e. as an approximation of entropy, yields the same false positives.

On the flip side, documents with perforations but otherwise empty, likely aren't detected as empty with just a noisy trim. If you care about those, it might make sense to tell ImageMagick to unconditionally trim some margin space, first. For example, if the image was scanned with 600 dpi and you want to ignore a 1 inch margin all around:

convert i1.png -shave 600x0 -virtual-pixel White -blur 0x15 -fuzz 15% -trim info: