Protecting files

“Protecting files”

Backup Rule of Three: always have 3 copies:

  • 1 remote system in case of disaster
  • 2 local copies, on different media

A backup is sometimes called an archive. An archive is a group of files with associated metadata. It is a copy of data that can be restored sometime in the future if the data becomes corrupted. You need to consider the following:

  • Backup type
  • Compression methods
  • Utilities that will help the most

Understanding backup types

System image
A clone, a copy of the OS binaries, config files, and whatever you need to boot.
Full
Copy of all data, ignoring its modification date. Quickly restores system data, but takes a long time to create the backup.
Incremental
Copy of data that has been modified since the last backup operation, by comparing timestamps. This method is quick, but might take a long time to actually restore.
Differential
Copy of all data that changed since last full backup. Good balance between full and incremental backup.
Snapshot
Hybrid approach - a full (usually read-only) copy of data is made to backup media. Then pointers (ex: hard links) are employed to create a reference table linking the backup data with the original data. During next backup, only modified files are copied to backup media, and the pointer reference table is copied and updated.

You can go back to any point in time (restore point) and restore the data from there. Very efficient and takes less space and processing power.

rsync uses the snapshot approach.

Snapshot clone
After a snapshot is created, it is cloned. Useful in high I/O environments. It is modifiable and mountable, so you can use it as disaster recovery.

Compression methods

The following compression methods all provide lossless compression:
gzip
Replaced old compress program. Replaces original file with a compressed version and .gz file extension. gunzip to reverse the operation.
bzip2
Compressed Linux kernel until 2013. Replaces original file.
xz
Popular with Linux admins. Compresses Linux Kernel since 2013. Replaces original file
zip
Can operate on multiple files, packs them together in ‘folder’ or ‘archive’. Does not replace original file–places copy of the file into an archive.
# use the -<n> option with all but zip to control compression.
# -1 is fast but lowest compression
# -9 is slow but highest compression
# -6 is default 
$ ls -lh wtmp?
-rw-rw-r-- 1 137K Apr  4 22:08 wtmp1
-rw-rw-r-- 1 137K Apr  4 22:08 wtmp2
-rw-rw-r-- 1 137K Apr  4 22:08 wtmp3
-rw-rw-r-- 1 137K Apr  4 22:08 wtmp4

# compress each file
gzip wtmp1      # gunzip <compressed-file> to reverse
bzip2 wtmp2     # bunzip2 <compressed-file> to reverse
xz wtmp3        # unxz <compressed-file> to reverse 
zip wtmp4.zip wtmp4     # unzip <compressed-file> to reverse
  adding: wtmp4 (deflated 96%)

# xz compresses the best
ls -lh wtmp?.*
-rw-rw-r-- 1 5.6K Apr  4 22:08 wtmp1.gz
-rw-rw-r-- 1 4.4K Apr  4 22:08 wtmp2.bz2
-rw-rw-r-- 1 3.9K Apr  4 22:08 wtmp3.xz
-rw-rw-r-- 1 5.7K Apr  4 22:18 wtmp4.zip

# zip did not replace the file, still there
$ ls wtmp?
wtmp4

Archive and restore utilities

cpio

copy in and copy out, gathers together file copies and stores them in an archive file. Good for system image and full backup bc it maintains each file’s absolute directory path/reference.

# Pipe list of files into cpio
ls file-list | cpio -ov > output.cpio 
-I # specifies the archive file to use
-i # extract, copies files from archive or displays files within the archive
--no-absolute-filenames # only relative path names allowed
-o # copy-out mode, creates archive by copying files into it
-t # displays list of files within the archive
-v # verbose

# view files
ls
Project42.txt  Project43.txt  Project44.txt  Project45.txt  Project46.txt

# pipe files into cpio
ls Project4?.txt | cpio -ov > Project4x.cpio
Project42.txt
Project43.txt
Project44.txt
Project45.txt
Project46.txt
1 block

# view files
ls Project4?.*
Project42.txt  Project43.txt  Project44.txt  Project45.txt  Project46.txt  Project4x.cpio

# create archive of all files owned by <username>
find / -user <username> | cpio -ov > archive.cpio

# list archive contents
cpio -itvI Project4x.cpio 
-rw-rw-r--   1 ryanseym ryanseym        0 Apr  4 22:48 Project42.txt
-rw-rw-r--   1 ryanseym ryanseym        0 Apr  4 22:48 Project43.txt
-rw-rw-r--   1 ryanseym ryanseym        0 Apr  4 22:48 Project44.txt
-rw-rw-r--   1 ryanseym ryanseym        0 Apr  4 22:48 Project45.txt
-rw-rw-r--   1 ryanseym ryanseym        0 Apr  4 22:48 Project46.txt
1 block

# restore files
ls
Project4x.cpio

# to restore files to original location, use -ivI options
# use -no-absolute-filename option to restore to different directory
cpio -iv --no-absolute-filenames -I Project4x.cpio 
Project42.txt
Project43.txt
Project44.txt
Project45.txt
Project46.txt
1 block

# all files including .cpio
ls
Project42.txt  Project43.txt  Project44.txt  Project45.txt  Project46.txt  Project4x.cpio

tar

tape archiver, popular for creating data backups. Collects files and stores them in a an arvhice called a tar file. If the tar file is compressed, it is called a tarball.

Because it is a tape archiver, you can place tarballs or archive files on tape, such as an SCSI tape device. Replace the -f <filename> with the device filename, such as /dev/st0.

  • Good practice to use .tar extension.
  • When you use compression, add compression method to extension: .tar.gz or .tgz for gzip.
  • .snar for tarball snapshot file
tar [OPTIONS...] [FILENAME]...
-c # create a tar file
-f # archive file name, should use .tar extension
-u # update, appends files to an existing tar file, but only ones that  
   # were modified since original archive file was created.
-g # creates incremental or full archive based on metadata stored in provided file
-z # compresses using gzip
-j # compresses using bzip2
-J # compresses using xz
-v # verbose

# create tar file
$ tar -cvf Project4x.tar Project4?.txt
Project42.txttar -g FullArchive.snar -Jcvf Project42.txz Project4?.txt
Project42.txt
Project43.txt
Project44.txt
Project45.txt
Project46.txt
ryanseymour:~/linux-playground/archives 
$ ls FullArchive.snar Project42.txz
FullArchive.snar  Project42.txz

Project43.txt
Project44.txt
Project45.txt
Project46.txt

# tar file with gzip compression (tar)
$ tar -zcvf Project4x.tar.gz Project4?.txt
Project42.txt
Project43.txt
Project44.txt
Project45.txt
Project46.txt

tar full and incremental backups

tar views full and incremental backups in levels:

  • level 0 includes all files
  • level 1 is first incremental backup
  • level 2 is second incremental backup, etc…
tar [OPTIONS...] [FILENAME]...
-d # compare tar archive file members with external files 
-t # display tar archive file's contents (members)
-W # verify each file as it is processed. Can't use with compression.

# 1. creates snapshot .snar file w timestamp metadata to create backups
tar -g FullArchive.snar -Jcvf Project42.txz Project4?.txt
Project42.txt
Project43.txt
Project44.txt
Project45.txt
Project46.txt

# 2. verify created
ls FullArchive.snar Project42.txz
FullArchive.snar  Project42.txz

# 3. Update file
echo 'Answer to everything' >> Project42.txt 

# 4. create incremental backup. Project42_Inc.txz contains only Project42.txt
#    because its the only file that was modified since the previous backup.
tar -g FullArchive.snar -Jcvf Project42_Inc.txz Project4?.txt
Project42.txt

# view tarball files/members
tar -tf Project4x.tar.gz 
Project42.txt
Project43.txt
Project44.txt
Project45.txt
Project46.txt

# compare archive files against current files
$ tar -df Project4x.tar.gz 
Project42.txt: Mod time differs
Project42.txt: Size differs

# verify backup after archive is created. can't compress, must
# be in next step.
tar -Wcvf ProjectVerify.tar Project4?.txt
Project42.txt
...
Verify Project42.txt
...

tar restore

Basically same as compress command, but sub the -c for -x:

tar [OPTIONS...] [FILENAME]...
-x # extract files from tarball or archive and place in cwd
-z # decompresses with gunzip
-j # decompresses with bunzip2
-J # decompresses with unxz

# extract gzip tarball (tarball is not removed)
tar -zxvf Project4x.tar.gz 
Project42.txt
Project43.txt
Project44.txt
Project45.txt
Project46.txt

dd

Creates low-level copies of an entire hard drive or partition (including MBR), such as:

  • creating system images for forensics
  • copying damaged disks
  • wiping partitions
# input- and output-device is either entire drive or partition
dd if=<input-device> of=<output-device> [OPERANDS]
dd if=[SOURCE] of=[TARGET] [OPERANDS]
bs=BYTES        # sets max block size to read and write. Default is 512
count=N         # sets number of input blocks to copy
status=level    # sets amount of info to display to STDERR. Set to one of the following:
                #   none: displays only error messages
                #   noxfer: does not display final transfer stats
                #   progress: displays periodic transfer stats

# copy entire disk
# 1. list all block devices to make sure that drives are not mounted
lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
...

# 2. copy /dev/sdb to /dev/sdc
dd if=/dev/sdb of=/dev/sdc status=progress

# zero (wipe) a disk if you are throwing it away
# /dev/zero device file writes zeroes to the disk
dd if=/dev/zero of=/dev/sdc status=progress

Replication

rsync

Better than scp for large files.

Great at backing up larger files over the network. Also, backup files locally:

rsync [OPTION]... SOURCE DEST
-e # changes the program for network connection, OpenSSH by default
-z # compresses the data with zlib during transfer, good for bad network connections

# previously discussed options
-a # archive mode, equivalent to -rlptgoD (dir tree backups)
-D # retain Device and special files
-g # retain file group
-h # human-readable numeric output
-l # copy symbolic links
-o # retain file owner
-p # retain file perms
-P, --progress # display progress of file copy
-r # recursive
--stats # display file transfer stats
-t # retain file's modification time
-v # verbose

# backup files locally
rsync -avh *.tar TarStorage/
sending incremental file list
Project4x.tar
ProjectVerify.tar

sent 20.67K bytes  received 54 bytes  41.46K bytes/sec
total size is 20.48K  speedup is 0.99

ls TarStorage/
Project4x.tar  ProjectVerify.tar

# securely over the network (-e option uses OpenSSH)
rsync -avP -e ssh *.tar user1@10.20.30.40:~/path/to/target_dir

Offsite/Off-System backups

scp

Secure Copy Protocol

  • Quickly transfers files in a noninteractive mannger between two systems on a network
  • Uses SSH
  • Best for small copies you need on the fly, bc if it gets interrupted, you cannot pick back up where you left off
  • Will overwrite files on remote host if they have the same name:
scp [FILE] user@10.20.30.40:/absolute/path/to/target-dir
-C # compresses the file during transfer
-p # preserves file access, modification times, and perms
-r # recursive copy
-v # verbose

# copy from remote to remote
scp user@10.20.30.40:~/home user2@10.20.30.41:~/home/files

sftp

SSH File Transfer Protocol.

Good for transferring larger files or archives. Provides interactive experience, for example:

  • create directories as needed
  • immediately check on transferred files,
  • determine remote systems pwd
sftp username@localhost
username@localhosts password: 
Connected to localhost.
sftp> 

# common commands
bye # exits and quits sftp
exit # exits and quits sftp
get # gets file (downloads) from remote to local machine
reget # resumes interrupted get operation
put # sends (uploads) files from local to remote
reput # resumes interruped put operation
ls # list files in remote pwd
lls # list files in local pwd
mkdir # create dir on remote
lmkdir # create dir on local
progress # toggle progress display on/off (default is on)

# upload file to remote (localhost, here)
sftp username@localhost
username@localhosts password: 
Connected to localhost.

# check remote pwd
sftp> ls
Desktop                  Development              Documents                Downloads                
...

# check local pwd  
sftp> lls
Extract		  Project42_Inc.txz  Project43.txt  Project46.txt      TarStorage
FullArchive.snar  Project42.txt      Project44.txt  Project4x.tar
not-absolute	  Project42.txz      Project45.txt  ProjectVerify.tar

# make directory on local (would normally be remote, but we use localhost)
sftp> lmkdir sftp_files

# verify the dir was made (would be 'ls' on remote)
sftp> lls
Extract		  Project42_Inc.txz  Project43.txt  Project46.txt      sftp_files
...

# upload file
sftp> put Project4x.tar
Uploading Project4x.tar to /home/username/linux-playground/archives/sftp_files/Project4x.tar
Project4x.tar                                                     100%   10KB   7.5MB/s   00:00

# check for file on remote
sftp> ls
Project4x.tar   

# exit
sftp> bye

Verify backup integrity

You need to check that your archives or files were not corrupted during transfer.

md5sum

md5sum uses the MD5 message digest algorithm. Was used in cryptography, but now mostly for file checking bc of vulnerabilities.

  • Produces 128-bit hash value
md5sum FILE
md5sum Project4x.tar 
b1b68da5c780424b4e6949305b76541d  Project4x.tar

Other hash algorithms

SHA-512 is the best for security purposes and used to hash passwords:

# find all algorithms on system (-1 lists 1 per line)
ls -1 /usr/bin/sha???sum
/usr/bin/sha224sum
/usr/bin/sha256sum
/usr/bin/sha384sum
/usr/bin/sha512sum

md5sum Project4x.tar 
b1b68da5c780424b4e6949305b76541d  Project4x.tar

# sha224sum
sha224sum Project4x.tar 
d3db117bb61330e6ed409d9d7c6f56c0db158c6e0dbf02784380a077  Project4x.tar

# sha256sum
sha256sum Project4x.tar 
d8c1332f37299100d2c2f6e69f4e7efe41cfd56986deedd439dd194b75d9a0ce  Project4x.tar

# sha384sum
sha384sum Project4x.tar 
e3f430cfecee455fb31b575421cff6ae438132e633a6446aa1bbf0e3dedbd28c5a56a810e51efbb8951a4a308a22d1d6  Project4x.tar

# sha512sum
sha512sum Project4x.tar 
6deb50480ff3b9c4b4205147b8376ce84e28be875a6ed9fde24937f6287975bd31d0e51412e7cb5647185b2cedea5973e1ca91c2e89a5c4638213384464e69c6  Project4x.tar