The data compression program

Concept Index

  • concatenated files
  • Environment
  • overview
  • tapes
  • A digression: Taken from bzip2.txt 9/11/10 v1.06


    a block-sorting file compressor

    bzip2 [ -cdfkqstvzVL123456789 ] [ filenames … ]
    bunzip2 [ -fkvsVL ] [ filenames … ]
    bzcat [ -s ] [ filenames … ] # - decompresses files to stdout
    bzip2recover filename # - recovers data from damaged bzip2 files

    More at wikipedia. See the link at the bottom too.

    The command-line options are deliberately very similar to those of GNU gzip.

    Files on the command line (or expanded by globing) are replaced by a compressed version with the name suffixed by .bz2
    Compressed files retain ownership, permissions, and modification date ( access and change date are not preserved).

    Files are not over-written, specify --force.

    Piping is done if no file names are specified reading from standard input to writing to standard output (usefull to pipe elsewhere).

    Decompresses specified files, unless they were not created by bzip2 which will be skipped with a warning.
    Filename for the decompressed file from that of the compressed file as follows:

    filename.bz2 → filename
    filename.bz → filename
    filename.tbz2 → filename.tar
    filename.tbz → filename.tar
    anyothername → anyothername.out

    If the file does not end in a recognised ending, .bz2, .bz, .tbz2 or .tbz, bzip2 warns that it cannot determine the name of the original file, and uses the original name with .out appended.

    Given the concatenation of two or more compressed files produces the concatenation of the corresponding uncompressed files.

    Integrity testing (-t) of concatenated compressed files is supported.

    files re output to the standard output by using -c .
    Multiple files may be compressed and decompressed using this.
    The resulting outputs are fed sequentially to stdout. Compression of multiple files in this manner generates a stream containing multiple compressed file representations.

    bzcat (or bzip2 -dc) decompresses to the standard output.

    bzip2 reads arguments from the environment variables $BZIP2 and $BZIP, in that order, and will process them before any arguments read from the command line. This gives a convenient way to supply default arguments.

    Return values:
    0 normal exit,
    1 environmental problems (file not found, invalid flags, I/O errors
    2 to indicate a corrupt compressed file,
    3 internal consistency error (eg, bug)

    Test integrity of the files without writing output
    write to standard output. Useful for piping.
    1. overwrites existing output files.
    2. hard links to files are severed
    3. Files that don't to be compressed pass unmodified.
    Keep input files
    Reduce memory usage (at the expense of creating larger output fieles)
    Suppress warnings .
    I/O errors and critical events will not be suppressed.
    Show the compression ratio, multiple -v's increase the verbosity
    -1 or --fast
    -9 or --best
    Set the block size to 100 k, 200 k .. 900 k when compressing.
    Only useful for very small memory environments.
    Aliases for GNU gzip compatibility.
    --fast doesn't , --best selects the default behaviour.
    -- subsequent arguments as file names,
    example: bzip2 -- -myfilename.

    Recovering data from damaged files:

  • bzip2 compresses files in blocks, handled independently. If a error causes a file to become damaged, it may be possible to recover data from the undamaged blocks in the file.

    bzip2, bunzip2 and bzcat are the same program, and the decision about what actions to take is done on the basis of which name is used.

    Author Julian Seward, jsewardbzip.org. bzip.org:

    --list, -l

    gzip -l *gz
              compressed        uncompressed  ratio uncompressed_name
                     20                   0   0.0% smother.diske-
                     20                   0   0.0% smother.diskf-
                     20                   0   0.0% smother.diskg-
                     20                   0   0.0% smother.diskh-
              798830592          3596346423  77.8% smother_wd0e
              798830672          3596346423  77.8% (totals) 
    with --verbose
     gzip -lv *gz
    method  crc     date  time           compressed        uncompressed  ratio uncompressed_name
    defla 00000000 Sep  1 15:00                  20                   0   0.0% smother.diske-
    defla 00000000 Sep  1 15:01                  20                   0   0.0% smother.diskf-
    defla 00000000 Sep  1 15:01                  20                   0   0.0% smother.diskg-
    defla 00000000 Sep  1 15:01                  20                   0   0.0% smother.diskh-
    defla dbd673f2 Sep  1 16:09           798830592          3596346423  77.8% smother_wd0e
                                          798830672          3596346423  77.8% (totals) 
    The uncompressed size is given as -1 for files not in gzip format, such as compressed .Z files.
    To get the uncompressed size for such a file, use: zcat file.Z | wc -c
    The crc is given as ffffffff for a file not in gzip format.
    Title and totals lines are not displayed with --quiet,.


    Travel the directory structure recursively.
    -S suf
    --suffix suf
    Suffix suf instead of .gz default.
    Most useful for decompression.
    Any suffix can be given, but suffixes other than .z and .gz should be avoided to avoid confusion when files are transferred to other systems.
    A null suffix forces gunzip to try decompression on all given files regardless of suffix, as in:
     gunzip -S "" *        (*.* for MSDOS) 
    Previous versions of gzip used the .z suffix. This was changed to avoid a conflict with pack.

    Test the compressed file integrity.

    Suppress all warning messages.
    Display the name and percentage reduction for each file
    When compressing, do not save the original file name and time stamp by default. (The original name is always saved if the name had to be truncated.)
    When decompressing, do not restore the original file name if present (remove only the gzip suffix from the compressed file name) and do not restore the original time stamp if present (copy it from the compressed file).
    Default when decompressing.
    When compressing, always save the original file name and time stamp (default).
    When decompressing, restore the original file name and time stamp. Useful on systems which have a limit on file name length.
    Write output on standard output; keep original files unchanged.
    If there are several input files, the output consists of a sequence of independently compressed members.
    To obtain better compression, concatenate all input files before compressing them.
    Specify speed/compression tradeoff
    --fast or -1 fastest / less compression and
    --best or -9 slowest / more compression .
    default -6 (biased towards high compression at expense of speed).
    even if the file has multiple links or the corresponding file already exists, or if the compressed data is read from or written to a terminal. If the input data is not in a format recognized by gzip, and if --stdout is also given, copy the input data without change to the standard ouput: let
    zcat behave as cat. If --force is not given, and when not running in the background, gzip prompts to verify whether an existing file should be overwritten.
    Display the gzip license then quit.

    The following command will find all gzip files in the current directory and subdirectories, and extract them in place without destroying the original:

            find . -name '*.gz' -print | sed 's/^\(.*\)[.]gz$/gunzip < "&" > "\1"/' | sh 

    Advanced usage

    Multiple compressed files can be concatenated. In this case, gunzip will extract all members at once. If one member is damaged, other members might still be recovered after removal of the damaged member.
    Better compression can be usually obtained if all members are decompressed and then recompressed in a single step.

    This is an example of concatenating gzip files:

         gzip --to-stdout file1  > foo.gz
         gzip --to-stdout file2 >> foo.gz 

    In case of damage to one member of a .gz file, other members can still be recovered (if the damaged member is removed).
    Better compression is obtained by compressing all members at once:

        cat file1 file2 | gzip > foo.gz 

    compresses better than gzip --to-stdout file1 file2 > foo.gz

    To recompress concatenated files to get better compression: zcat old.gz | gzip > new.gz

    If a compressed file consists of several members, the uncompressed size and CRC reported by the --list option applies to the last member only.
    To display the uncompressed size for all members, use:

         zcat file.gz | wc -c 

    To create a single archive file with multiple members so that members can later be extracted independently, use an archiver such as tar or zip. GNU tar supports the -z option to invoke gzip transparently.

    gzip is designed as a complement to tar, not as a replacement.


    $GZIP holds default options, interpreted first and can be overwritten by explicit command line parameters.
    For example:
    for sh:    GZIP="-8v --name"; export GZIP
    for csh:   setenv GZIP "-8v --name"
    for MSDOS: set GZIP=-8v --name 

    Using gzip on tapes

    When writing compressed data to a tape, it is generally necessary to pad the output with zeroes up to a block boundary.
    When the data is read and the whole block is passed to gunzip for decompression, gunzip detects that there is extra trailing garbage after the compressed data and emits a warning by default, use --quiet to suppress the warning.
    This option can be set in $GZIP as in:

    for sh:    GZIP="-q"  tar -xfz --block-compress /dev/rst0
    for csh:   (setenv GZIP "-q"; tar -xfz --block-compress /dev/rst0) 

    In the above example, gzip is invoked implicitly by the -z option of GNU tar. Make sure that the same block size (-b option of tar) is used for reading and writing compressed data on tapes. (This example assumes you are using the GNU version of tar.)


    The original documentation containg supstantial discussion related to legacy versions running on VM, MSDOS... which had file systems with sever limitations including kength of filename.

    gzip reduces the size using Lempel-Ziv coding (LZ77) replacing a file by one with the extension .gz.
    If no files are specified or if a file name is - standard input is compressed to the standard output.
    Only attempt to compress regular files ( symbolic links are ignored).

    gzip keeps the original file name and timestamp in the compressed file, which is used when decompressing the file with -N .

    gunzip takes a list of files on its command line and replaces each file whose name ends with .gz, zcat is identical to gunzip -c. zcat have a .gz suffix or not.

    Apple gzip 272.250.1
    usage: gzip [-123456789acdfhklLNnqrtVv] [-S .suffix] [ [ ...]]
     -1 --fast            fastest (worst) compression
     -2 .. -8             set compression level
     -9 --best            best (slowest) compression
     -c --stdout          write to stdout, keep original files
     -d --decompress      uncompress files
     -f --force           force overwriting & compress links
     -h --help            display this help
     -k --keep            don't delete input files during operation
     -l --list            list compressed file contents
     -N --name            save or restore original file name and time stamp
     -n --no-name         don't save original file name or time stamp
     -q --quiet           output no warnings
     -r --recursive       recursively compress files in directories
     -S .suf              use suffix .suf instead of .gz
        --suffix .suf
     -t --test            test compressed file
     -V --version         display program version
     -v --verbose         print extra statistics
    gzip --license
    Apple gzip 272.250.1 (based on FreeBSD gzip 20150113)
       Copyright (c) 1997, 1998, 2003, 2004, 2006 Matthew R. Green