Finding and removing duplicate files in osx with a script

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Digital Sunset Looping

--

Chapters
00:00 Finding And Removing Duplicate Files In Osx With A Script
01:09 Accepted Answer Score 4
02:05 Answer 2 Score 65
02:26 Answer 3 Score 4
02:59 Answer 4 Score 0
03:46 Thank you

--

Full question
https://superuser.com/questions/481456/f...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#macos #bash

#avk47

ANSWER 1

Score 65

Another option is to use fdupes:

brew install fdupes
fdupes -r .

fdupes -r . finds duplicate files recursively under the current directory. Add -d to delete the duplicates — you'll be prompted which files to keep; if instead you add -dN, fdupes will always keep the first file and delete other files.

ACCEPTED ANSWER

Score 4

Firstly, you'll have to reorder the first command line so the order of files found by the find command is maintained:

find . -size 20 ! -type d -exec cksum {} \; | tee /tmp/f.tmp | cut -f 1,2 -d ‘ ‘ | sort | uniq -d | grep -hif – /tmp/f.tmp > duplicates.txt

(Note: for testing purposes in my machine I used find . -type f -exec cksum {} \;)

Secondly, one way to print all but the first duplicate is by use of an auxiliary file, let's say /tmp/f2.tmp. Then we could do something like:

while read line; do
    checksum=$(echo "$line" | cut -f 1,2 -d' ')
    file=$(echo "$line" | cut -f 3 -d' ')

    if grep "$checksum" /tmp/f2.tmp > /dev/null; then
        # /tmp/f2.tmp already contains the checksum
        # print the file name
        # (printf is safer than echo, when for example "$file" starts with "-")
        printf %s\\n "$file"
    else
        echo "$checksum" >> /tmp/f2.tmp
    fi
done < duplicates.txt

Just make sure that /tmp/f2.tmp exists and is empty before you run this, for example through the following commands:

rm /tmp/f2.tmp
touch /tmp/f2.tmp

Hope this helps =)

ANSWER 3

Score 4

I wrote a script that renames your files to match a hash of their contents.

It uses a subset of the file's bytes so it's fast, and if there's a collision it appends a counter to the name like this:

3101ace8db9f.jpg
3101ace8db9f (1).jpg
3101ace8db9f (2).jpg

This makes it easy to review and delete duplicates on your own, without trusting somebody else's software with your photos more than you need to.

Script: https://gist.github.com/SimplGy/75bb4fd26a12d4f16da6df1c4e506562

ANSWER 4

Score 0

This is done with the help of EagleFiler app, developed by Michael Tsai.

tell application "EagleFiler"

      set _checksums to {}
      set _recordsSeen to {}
      set _records to selected records of browser window 1
      set _trash to trash of document of browser window 1
      repeat with _record in _records
          set _checksum to _record's checksum
          set _matches to my findMatch(_checksum, _checksums, _recordsSeen)
          if _matches is {} then
              set _checksums to {_checksum} & _checksums
              set _recordsSeen to {_record} & _recordsSeen
          else
              set _otherRecord to item 1 of _matches
              if _otherRecord's modification date > _record's modification date 
then

            set _record's container to _trash
            else
                set _otherRecord's container to _trash
                set _checksums to {_checksum} & _checksums
                set _recordsSeen to {_record} & _recordsSeen
            end if
        end if
    end repeat
end tell

on findMatch(_checksum, _checksums, _recordsSeen)

    tell application "EagleFiler"
        if _checksum is "" then return {}
        if _checksums contains _checksum then
            repeat with i from 1 to length of _checksums
                if item i of _checksums is _checksum then
                    return item i of _recordsSeen
                end if
            end repeat
        end if
        return {}
    end tell

end findMatch

You can also automatically delete duplicates with duplicate file remover suggested in this post.