Tar Archives with a Twist

hashtar logo



hashtar - manual page for hashtar 1.0.2


hashtar {-c | --create} <file>...
{-t | --list}
[-h | --help | --detailed-help | --version]


The hashtar utility creates POSIX tar archives, but for each given regular file, it copies the *hash value* rather than the actual data into the archive. Other than that, the archive matches the original tree. In particular, directory structure, symbolic links and status information (timestamps, ownership, ...) are retained.

Since only the fixed-sized hashes are stored, the size of the archive depends only on the number of files but not on their sizes. A typical archive is not much larger than the size of the corresponding file list after compression. In contrast to a file list, however, one can extract the tarball to get a "virtual view" of the original tree.

Hashtar was written with performance in mind. It contains the highly optimized sha1 implementation of git and spawns as many worker processes as there are available CPUs. Input files are memory mapped and pre-faulted serially from the controlling process, ensuring a streaming input workload.

Portability was not a priority. Likely, it only works on GNU/Linux. In particular, sched_getaffinity() is non-portable Linux-specific and MAP_POPULATE is a GNU extension.

Hashtar operates either in "create" or in "list" mode, depending on the given options. There is no "extract" mode since plain tar can be employed for this purpose.


-h, --help

print help and exit

-h, --detailed-help

like --help but also prints description and license

-v, --version

print version and exit

-c, --create

create a new archive

The remaining arguments are expected to be files or directories to be added to the archive. The archive is written to stdout. Directories are archived recursively.

-t, --list

list the contents of an archive

A tar archive is read from stdin and a file list is written to stdout. In contrast to plain tar(1) output, the list contains only the regular files, and hash values are included. The output is suitable as input for sha1sum --check.




hashtar -c directory | tar -tv

hashtar -c directory | hashtar -t | sha1sum -c


Written by Andre Noll.


Report bugs to Andre Noll <maan@tuebingen.mpg.de>

Git version: aabf1d (November 2018)


Copyright © 2016 Andre Noll <maan@tuebingen.mpg.de>

License: GNU GPL version 2

This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.


tar(1), sha1sum(1)

Homepage: http://people.tuebingen.mpg.de/~maan/hashtar