Does Git prevent data degradation
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Switch On Looping
--
Chapters
00:00 Does Git Prevent Data Degradation
00:36 Accepted Answer Score 62
01:05 Answer 2 Score 16
02:36 Answer 3 Score 1
03:38 Thank you
--
Full question
https://superuser.com/questions/1253830/...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#git #zfs #btrfs #dataintegrity
#avk47
ACCEPTED ANSWER
Score 62
Git's hashing only happens at the time commits are created, and from there on the hashes are used to identify the commits. This in no way ensures the integrity of the files. Git repos can get corrupted and lose data. In fact, git has a built-in command to detect this kind of loss, git fsck, but as the documentation says, you are responsible for restoring any corrupted data from backups.
ANSWER 2
Score 16
Depends on what you mean by "prevent".
(First of all, bit-rot is a term with multiple definitions. This question is not about code becoming unrunnable due to lack of maintenance.)
If you mean by "prevent" that it will likely detect corruption by decay of bits, yes, that will work. It will however not help to fix that corruption: the hashes only provide error detection, not correction.
This is generally what is meant by "integrity": The possibility to detect unauthorized/unintended manipulation of data, not the possibility to prevent or correct it.
You would generally still want a RAID1 together with backups (possibly implemented with ZFS snapshots or similar, I am not familiar with the ZFS semantics on RAID1 + snapshots), for several reasons:
if a disk fails fatally, you either need a RAID1 (or a recent backup) to restore your data; no error correction can correct for a whole disk failing, unless it has a full copy of the data (RAID1). For a short downtime, you essentially must have RAID1.
if you accidentally delete parts or whole of the repository, you need a backup (RAID1 doesn’t protect you since it immediately reflects the change to all devices)
Block-level RAID1 (e.g. via LVM or similar) with only two disks in itself will not protect you against silent decay of data though: the RAID controller cannot know which of the two disks holds the correct data. You need additional information for that, like a checksum over files. This is where the ZSF and btrfs checksums come in: they can be used (which is not to say that they are used in these cases, I don’t know how ZFS or btrfs handle things there) to distinguish which of the two disks holds the correct data.
ANSWER 3
Score 1
prevent bit-rot
No, it does not, in no way at all. There is no RAID-like redundancy introduced by git. If the files in your .git
directory suffer bit-rot, you will lose stuff just as usual.
help against bit-rot?
Yyyy...no. It does not help against bit-rot occuring, but it will help to detect bit-rot. But at no point during normal use does it do so by its own account (well obviously it does when you check out some objects and so on, but not for your history). You would have to create cron jobs to recalculate the hashes from the content and compare them to the actual hashes. It is pretty trivial to do so, as git
hashes are literally simply the content hashes, it is trivial to recalculate them and git fsck
does so for you. But when it detects bit-rot, there is nothing in particular that it can do against it. Specifically, as larger chunks are automatically compressed, you will likely incur total chunk loss if a bit in a larger object is flipped.