“Partial File Updating” – Incremental File Uploads with High-Sync

Incremental File Uploads.  High-Sync can detect which parts of a file have changed, and copy only the changed blocks rather than the complete file.   The idea is to reduce copy time after the initial file has been uploaded.  We also call this partial file updating.  This feature is similar to but does not use the Rsync protocol.  In all cases High-Sync must be the only program manipulating destination files in order to track changes properly.

Incremental file uploads are supported when using the SSH/SFTP protocol or when doing drive to drive copies.  You can also use incremental file uploads when using a VPN (tunnel), saving files to a shared drive on the remote side.  If you need incremental block upload capability when using other clouds or protocols that don’t support it, you may be able to achieve this by zipping your files up into an archive with “synthetic backup”.  See Method 3 below.

Incremental block level upload is a good idea to save time for large block oriented file types but may not be as effective for small or stream based files.  Block oriented files include database files such as SQL or Outlook PST, as well as drive images and virtual hard disk images (VMs).  Stream-based files, on the other hand, will usually cause all blocks to be changed whenever they are modified (for example text documents, spreadsheets, zip files, and photos).  To further illustrate this point, Microsoft Word’s .docx format is actually a .zip file, which means that even small text changes modify the entire file because it’s saved using compression.  Obviously if the entire file changes then block-level copying won’t be able to save much time or bandwidth.  It’s also possible that external programs such as hard drive defrag programs trigger changes that require an entire file to be re-uploaded.

While block-level copying is certainly advantageous, some services simply can’t add it. The reason stems from the fact that in order to perform block-level analysis on a file, the service must be able read it, which it can’t without knowing the file’s encryption key.  In particular, we’re talking about zero-knowledge cloud services. With a zero-knowledge provider, only you, the account holder, hold the encryption key. Box, Google Drive or OneDrive aren’t zero-knowledge services, yet they do not support block level incremental.  You have one of two choices open to you: either you go with block-level copying or with zero-knowledge encryption to the cloud. Do you want to keep a cloud storage service from being able to read your files or do you want to be able to sync content as quickly as possible and with minimum bandwidth used?  If you’d like to learn more, here is an good article.  If you use synthetic backup to a .zip file, you may be able to overcome these limitations.

Copying only the changed blocks can save bandwidth and time, especially over a slow connection. When copying between local disks or in a LAN environment, it can save bandwidth too, but may not always save much copying time, because the source file has to be read in its entirety every time in order to determine the changed blocks.  In some cases it can actually slow transfers down.

In High-Sync, block-level copying is called “Partial File Updating”. In many cases, you need to choose only this one checkmark, which is on the Special settings category in the profiles (in Advanced Mode).

The program needs to have fast access to at least one of the sides of the synchronization. The other side may be a low-bandwidth connection. If you are using an Internet Protocol, please note that only SSH/SFTP supports block-level updating directly. Block level copying with SSH/SFTP has only been implemented for uploads, not downloads.The other protocols can only be used with Synthetic Backup (see Method 3 below).

Partial File Updating can work in three ways:

Method 1: Uploading Files (works with local/network drives, VPN, or using SSH/FTP)
Method 2: Speed up using special service software on remote side
Method 3: Backing up to a ZIP rather than file sync (synthetic backup)

Method 1: Uploading Local Files
In this mode, the block level incremental speed-up is available when you copy files from a location to which you have fast access (preferably your own hard disk). The destination can be a slow connection, but it must be a normal file system (either LAN or VPN) or SSH/SFTP. For other connections, you can use Synthetic Backup (Method 3).

  •  Source access must be fast
  •  Destination may be slow
  •  MD5 checksums are stored in database
  •  Destination files must not be modified by any other profile, person, or tool
  •  Destination must be accessed via LAN, VPN, or SSH/SFTP

Instructions for Method 1
In your profile, make the following checkmark: Use Partial File Updating, which is on the Special tab sheet when editing the profile in Advanced Mode. The next time you run the profile, a database is created on your hard drive where information needed for the speed-up is stored. The second time you run the profile, you should notice the speed-up.

Method 2: Use Remote Service for Additional Speed:
This method is similar to Method 1 but has a speed up technique and can update large files in both directions. The remote computer can be both source and/or destination. This is achieved by running a small service application on the remote computer, which will create the necessary checksums on the fly, when requested by the main application running on a different machine.

The other (local) computer, where the main High-Sync program is running, needs to have normal file system access to the remote computer (LAN or VPN), or it can use SSH. It needs to have write access to the remote computer so that it can save the checksum request file there. The MD5 checksums are created when needed, so that no database is being used.

Instructions for Method 2
On the remote system, run the Setup program and install the High-Sync Remote Service along with its control panel. Start the control panel from the High-Sync group in the Start menu. On the tab sheet Configure Checksummer, enter the base folders that will be used for synchronization. Click Apply. On the tab sheet Service Configuration, click on Install Service and Start. The service will be using the Windows System account by default. If this account doesn’t have sufficient access privileges, you may have to change the account in Windows Control Panel -> Administrative Tools -> Services.

On the local system, you are running the main High-Sync program. In your profile, the right-hand side must be the remote system. Specify one of the folders which you have specified for the remote service to monitor. The left side should be your local folders, or a network drive with relatively fast access. On the Specials tab sheet in Advanced Mode, make the following checkmarks: Use Partial File Updating and Right side uses Remote Service.

  • High-Sync Remote Service computes MD5 checksums on remote computer
  • The “slow” side can be both source or destination
  • MD5 checksums are newly calculated each time
  • Files on both sides can be modified by other profiles, persons, or tools
  • One side must be local or LAN/VPN, the other can be LAN, VPN, or SSH/SFTP

Method 3: Synthetic Backup to Zip files
This feature is intended for backing up from local storage to any type of backup storage.  However, there is no attempt to keep individual files on the destination storage.  Rather, files are stored in a .ZIP archive.   This is more of a backup program technique rather than a sync program technique.  When you look at image based backups on the market, they tend to store all files in one big file and that is what is being done here as well.  Choose “Synthetic Backup” on the tab sheet Versioning->Synthetic Backup. This will automatically place these additional check marks:

  • Use Partial File Updating (under Special)
  • Filename Encoding (under Versioning)
  • Zip Each File Individually (under Zip/Encryption)

What this method does in practice is that it

  • adds Zip compression, versioning, and filename encoding
  • can work locally or with any Internet Protocol for the destination side
  • the changed blocks are uploaded in a new, separate zip file every time
  • all older zip files must stay on the backup storage, but can be thinned out
  • all connection types and Internet Protocols are supported
  • As with Method 1, destination files must not be modified by any other profile, person, or tool. The intention is restores should be done from within High-Sync.