Overview


SyncBack uses file hashing (e.g. MD5) for several important operations, including file integrity checking, copy verification, and rename detection. While hashing a local file is typically very fast, hashing a file on a remote location such as a NAS, network share, or mapped drive can be significantly slower. This article explains why.


The Problem


To hash a file, every byte of that file must be read. When the file is on a remote location, every byte must be transferred over the network to your computer before it can be hashed. This effectively turns a hashing operation into a full file download, even though the file itself is not being copied.


For example, to verify a 10 GB file after copying it to a network share, SyncBack must read all 10 GB back over the network just to calculate its hash value. This is the equivalent of copying the file a second time.


What Determines the Speed


Three components form a chain, and the slowest one determines the overall speed:


1. Storage read speed - how fast the remote drive can read the file

2. Network speed - how fast the data can travel across the network

3. Hash computation speed - how fast your CPU can calculate the hash


Only one of these can be the bottleneck at any given time, but which one depends on your hardware.


Real-World Examples


Example 1: NAS with a hard drive over a 1 Gbps network


This is a very common home and small office setup: a NAS with traditional hard drives connected via a standard Gigabit Ethernet network.


- Hard drive sequential read speed: approximately 100 to 180 MB/s

- 1 Gbps network (real-world over SMB): approximately 110 MB/s

- MD5 hash computation: approximately 500 to 800 MB/s


The network is the bottleneck. Hashing a 10 GB file would take approximately 1.5 to 2 minutes. That is on top of the time already spent copying the file.


Example 2: NAS with NVMe storage over a 10 Gbps network


This is a higher-end setup with fast solid-state storage and a 10 Gigabit network connection.


- NVMe sequential read speed: approximately 3,000 to 7,000 MB/s

- 10 Gbps network (real-world over SMB): approximately 1,000 to 1,100 MB/s

- MD5 hash computation: approximately 500 to 800 MB/s


The CPU hash computation is now the bottleneck. Hashing a 10 GB file would take approximately 15 to 20 seconds.


Scaling Up


These times are per file. If a profile contains hundreds of large files, the total time spent hashing can add up to hours. For example, hashing 500 GB of files over a 1 Gbps network would take approximately 75 to 85 minutes.


Why It Matters for SyncBack


Several SyncBack features require hashing remote files:


  • Copy verification reads back the file that was just copied to confirm it was written correctly. This is the most reliable way to verify a copy, but it doubles the network traffic for every file.
  • Integrity checking reads files to verify they have not changed or become corrupted since they were last backed up.
  • Rename detection can require hashing files on both the source and destination to determine whether a file has been renamed rather than deleted and recreated.
  • Fast Backup using file hashes compares hash values to detect changes, which requires reading the full contents of files that may not have changed.


Recommendations


If profile runs are taking longer than expected, consider the following:


  • Copy verification is the most common cause of slow profiles when using network storage. If your network and storage are reliable, you may choose to disable verification or use a less thorough method. Be aware that disabling verification means corrupted copies will not be detected.
  • Fast Backup using file modification dates and sizes is much faster than hash-based comparison because it only reads file metadata, not file contents.
  • Rename detection can be disabled if it is not needed. Without it, a renamed file is treated as a deletion and a new file, which may be acceptable depending on your use case.
  • A faster network connection will help in most scenarios. Upgrading from 1 Gbps to 2.5 Gbps or 10 Gbps can significantly reduce hashing time.
  • FTP and SFTP servers that support server-side hashing commands (such as MD5, XMD5, XCRC, or SHA-1) allow the hash to be calculated on the server without transferring the file contents over the network. SyncBack will use these commands automatically when the server supports them.