Compression and archives
Linux gathers files and directories in one step, and then optionally compresses them in the next step.
Compression methods
- The following compression methods all provide lossless compression:
- gzip
- Replaced old
compress
program. Replaces original file with a compressed version and .gz file extension.gunzip
to reverse the operation. - bzip2
- Compressed Linux kernel until 2013. Replaces original file.
- xz
- Popular with Linux admins. Compresses Linux Kernel since 2013. Replaces original file
- zip
- Can operate on multiple files, packs them together in ‘folder’ or ‘archive’. Does not replace original file–places copy of the file into an archive.
# use the -<n> option with all but zip to control compression.
# -1 is fast but lowest compression
# -9 is slow but highest compression
# -6 is default
$ ls -lh wtmp?
-rw-rw-r-- 1 137K Apr 4 22:08 wtmp1
-rw-rw-r-- 1 137K Apr 4 22:08 wtmp2
-rw-rw-r-- 1 137K Apr 4 22:08 wtmp3
-rw-rw-r-- 1 137K Apr 4 22:08 wtmp4
# compress each file
gzip wtmp1 # gunzip <compressed-file> to reverse
bzip2 wtmp2 # bunzip2 <compressed-file> to reverse
xz wtmp3 # unxz <compressed-file> to reverse
zip wtmp4.zip wtmp4 # unzip <compressed-file> to reverse
adding: wtmp4 (deflated 96%)
# xz compresses the best
ls -lh wtmp?.*
-rw-rw-r-- 1 5.6K Apr 4 22:08 wtmp1.gz
-rw-rw-r-- 1 4.4K Apr 4 22:08 wtmp2.bz2
-rw-rw-r-- 1 3.9K Apr 4 22:08 wtmp3.xz
-rw-rw-r-- 1 5.7K Apr 4 22:18 wtmp4.zip
# zip did not replace the file, still there
$ ls wtmp?
wtmp4
tar
Bundles project files into a single output file for easy transfer across the network.
- Preserves folder structure and ownership.
- Does not delete original files or directories.
.tar.gz
is the same as.tgz
tar [OPTIONS] <tarfile-name> [FILES...]
# - : output tar contents to STDOUT (dash in place of archive filename)
# -c: create new archive
# -C: specify directory for extracted tar contents (with -x)
# --delete -f: delete file from archive
# --exclude="pattern": exclude objects that match pattern
# -f: archive's filename (put last)
# -j: use bzip2 compression instead of
# -p: maintain permissions (during create)
# -r: appends file to existing tar archive (can't compress)
# -t: list contents of archive
# -v: verbose
# -x: extract files
# -z: gzip or unzip (depends whether -c or -x flag is present)
# all files in pwd
tar -cvf all-files.tar *
tar -cvzf myinits-$(date -I).tar /etc/init.d/ # with gzip
tar -cvjf myinits-$(date -I).tar.bz2 /etc/init.d/ # with bzip2
la -lhog myinits-2024-12-15.tar.bz2 myinits-2024-12-15.tar.gz # gzip vs bzip2
-rw-r--r-- 1 16K Dec 15 11:14 myinits-2024-12-15.tar.bz2
-rw-r--r-- 1 18K Dec 15 11:09 myinits-2024-12-15.tar.gz
# extract single and multiple files from zipped archive
tar -xvf myinits-2024-12-15.tar.gz etc/init.d/anacron
tar -xvf myinits-2024-12-15.tar.gz "etc/init.d/anacron" "etc/init.d/apache2"
# zip files
tar -czvf number-files.tar.gz *.[0-9]*
ls -logh number-files*
-rw-rw-r-- 1 30K Nov 8 14:15 number-files.tar
-rw-rw-r-- 1 477 Nov 8 14:17 number-files.tar.gz
# view files in tar
tar -tvf number-files.tar.gz
-rw-rw-r-- linuxuser/linuxuser 23 2024-11-08 13:48 file.1
...
# unzip and extract to a dir
tar -zxvf number-files.tar.gz -C extractions/
# create archive on remote
# pipe tar command to remote computer and create archive
tar -czvf - *.[0-9]* | ssh username@10.20.30.40 \
"cat > /home/path/to/dest/remote-tars.tar.gz"
# create archive on remote for only the current filesystem, plus /var and /usr
# might require sudo
tar -czvf - --one-file-system / /var /usr \
| ssh username@10.20.30.40 \
"cat > /home/path/to/dest/workstation-backup-Nov-8.tar.gz"
# delete file from archive
# extract and maintain permissions requires sudo
sudo tar -xzvf perms.tar.gz
ls -l
total 4
-rw-rw-r-- 1 linuxuser linuxuser 0 Nov 10 22:54 file1
-rw-rw-r-- 1 linuxuser linuxuser 0 Nov 10 22:54 file2
-rw-rw-r-- 1 newuser newuser 0 Nov 10 22:54 file3
-rw-rw-r-- 1 linuxuser linuxuser 157 Nov 10 22:58 perms.tar.gz
# unzip tar file with gunzip
gunzip file.tar.gz
# check size of tar
tar -czf - myinits-2024-12-15.tar.gz | wc -c
17868
split
Breaks archives into multiple smaller files for transfer, and recreates on the remote.
# split into 100 byte archive files
split -b 100 number-files.tar.gz "number-files.tar.gz."
ls -l *.gz.*
-rw-rw-r-- 1 linuxuser linuxuser 100 Nov 8 14:41 number-files.tar.gz.aa
-rw-rw-r-- 1 linuxuser linuxuser 100 Nov 8 14:41 number-files.tar.gz.ab
-rw-rw-r-- 1 linuxuser linuxuser 100 Nov 8 14:41 number-files.tar.gz.ac
...
# rebuild tar with cat
cat number-files.tar.gz.* > splits/number-recreated.tar.gz