The Computer Oracle

download file via http only if changed since last update

--------------------------------------------------
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Hypnotic Puzzle2

--

Chapters
00:00 Download File Via Http Only If Changed Since Last Update
00:31 Answer 1 Score 9
00:59 Accepted Answer Score 47
01:55 Answer 3 Score 7
02:12 Answer 4 Score 3
02:51 Thank you

--

Full question
https://superuser.com/questions/908293/d...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#linux #download #http

#avk47



ACCEPTED ANSWER

Score 47


Consider using curl instead of wget:

curl -o "$file" -z "$file" "$uri"

man curl says:

-z/--time-cond <date expression>

(HTTP/FTP) Request a file that has been modified later than the given time and date, or one that has been modified before that time. The date expression can be all sorts of date strings or if it doesn't match any internal ones, it tries to get the time from a given file name instead.

If $file doesn't necessarily pre-exist, you'll need to make the use of the -z flag conditional, using test -e "$file":

if test -e "$file"
then zflag="-z '$file'"
else zflag=
fi
curl -o "$file" $zflag "$uri"

(Note that we don't quote the expansion of $zflag here, as we want it to undergo splitting to 0 or 2 tokens).

If your shell supports arrays (e.g. Bash), then we have a safer and cleaner version:

if test -e "$file"
then zflag=(-z "$file")
else zflag=()
fi
curl -o "$file" "${zflag[@]}" "$uri"



ANSWER 2

Score 9


The wget switch -N only gets the file if it has changed so a possible approach would be to use the simple -N switch which will get the file if it needs to but leaves it with the wrong name. Then create a hard link using the ln -P command to link it to a "file" with the correct name. The linked file has the same metadata as the original.

The only limitation being that you cannot have hard links across file system boundaries.




ANSWER 3

Score 7


Python 3.5+ script for wrapping curl command:

import argparse
import pathlib

from subprocess import run
from itertools import chain

parser = argparse.ArgumentParser()
parser.add_argument('url')
parser.add_argument('filename', type=pathlib.Path)
args = parser.parse_args()

run(chain(
    ('curl', '-s', args.url),
    ('-o', str(args.filename)),
    ('-z', str(args.filename)) if args.filename.exists() else (),
))



ANSWER 4

Score 3


A similar approach to "date check" (with "curl --time-cond"), would be to download according to file size comparison, i.e. Download only if the local file has a different size than the remote file.

It is useful for example, when the download process failed in the middle, and thus the local downloaded file gets a newer date than the remote file, but it's actually corrupted, and re-downloading is required:

local_file_size=$([[ -f ${FILE_NAME} ]] && wc -c < ${FILE_NAME} || echo "0")
remote_file_size=$(curl -sI ${FILE_URL} | awk '/Content-Length/ { print $2 }' | tr -d '\r' )

if [[ "$local_file_size" -ne "$remote_file_size" ]]; then
    curl -o ${FILE_NAME} ${FILE_URL}
fi

The "curl -z / --time-cond" option (that was suggested in another answer) will not download the remote file in this case (cause the local file has a newer date), but this "size check" script will!