Re: [Hampshire] Open source network backup with de-dupe.

Top Page

Reply to this message
Author: James Courtier-Dutton
Date:  
To: Chris Dennis
CC: Hampshire LUG Discussion List
Subject: Re: [Hampshire] Open source network backup with de-dupe.
On 15 July 2010 20:44, Chris Dennis <cgdennis@???> wrote:
> On 15/07/10 15:39, James Courtier-Dutton wrote:
>>
>> Take 1 central site PC called "A"
>> Take two remote sites PC called "B" and "C".
>>
>> B has already sent a full backup to A.
>> C wishes to send a full backup to A, but lots of the data on C is the same
>> as B.
>> C generates HASHs of its files, and only sends the HASHs to A.
>> A responses to C saying which HASHs it has not already got from B.
>> C then only sends a subset of the data, I.e. data that was not already
>> sent from B.
>>
>> Thus, as lot of WAN bandwidth is saved.
>
> The problem is that hash collisions can occur.  Two files with the same hash
> are /probably/ the same file, but probably isn't good enough -- a backup
> system has to 100% sure.  And the only way to be certain is to get both
> files and compare them byte by byte.
>


There are algorithms that detect collisions without having to send the
entire file.
For example, rsync uses them.
Say you change one byte in a large file. rsync will not send the
entire file again, it will only send the changes.
If I followed your statement, I would have to stop using rsync.

Kind Regards

James