adu

Advanced disk usage


[README] [Download] [INSTALL] [License] [Contact] [Man page]

README

adu creates a database containing disk usage statistics of a given directory. This database can be queried to quickly retrieve, for example, the number and the size of all files in a subdirectory owned by a given user.

Four different output modes are available: global list, global summary, user list and user summary. The format of the output may be customized via format strings.

There’s an interactive mode which allows to quickly launch many queries on the same database using different modes and different output files.

By default, adu uses the user-summary output format which looks like this:

User summary
root                0   605    12K  267m
mysql             103     8   144    81m
postgres          113    19   506    31m
man                 6    37    87     2m
syslog            101     1    54     1m
...

The user-list mode prints the largest directories of one or more users:

uid 0 (root):
  55m    8  /var/cache/apt/apt-file/
  43m   35  /var/lib/apt/lists/
  27m    6K /var/lib/dpkg/info/
  25m    4  /var/cache/apt/
  20m  118  /var/lib/gconf/defaults/

Download

Only the source code is available for download. Use git to clone the adu repository by executing

git clone git://git.tuebingen.mpg.de/adu

or grab the tarball of the current tree. If you prefer to download the tarball of the latest release, select the corresponding snapshot link on the adu gitweb page


INSTALL

As adu is based on libosl, the object storage layer, you first have to install libosl.

Adu’s command line parser and the interactive help are generated by gnu gengetopt. Hence the gengetopt package must be installed to compile adu from source.

To generate the man page, help2man must be installed.


License

adu is open source software, licensed under the GNU General Public License, Version 2.


Contact

Email: André Noll, maan@tuebingen.mpg.de, Homepage: http://people.tuebingen.mpg.de/maan/

Comments and bug reports are welcome. Please provide enough info such as the version of adu/libosl you are using and relevant parts of the logs. Including the string [adu] in the subject line is also a good idea.

Man page

 

NAME

adu - advanced disk usage  

SYNOPSIS

adu [,OPTIONS/]...  

DESCRIPTION

adu-1.0.0

adu creates a database containing disk usage statistics of a given directory. It allows to query that database to quickly retrieve usage patterns of subdirectories and/or files owned by a given user id.

-h, --help
Print help and exit
--detailed-help
Print help, including all details and hidden options, and exit
-V, --version
Print version and exit
 

General options:

-l, --loglevel=,level/
Set loglevel (0-6) (default=`4')

Log messages are always written to stderr while normal output goes to stdout. Lower values mean more verbose logging.

Group: database

There are two ways to specify a database directory. You can either specify a full path using the database-dir option or a root path using the database-root option. In the latter case, a directory structure matching that of the base-dir argument is created below the given full path.

The advantage of using database-root is that the base-dir is used to find the relevant database both in create and select mode and you do not have to care for setting the database-dir explicitly.
-d, --database-dir=,path/
directory containing the osl tables

Full path to the directory containing the osl tables. This directory is created if it does not exist. It must be writable for the user running adu in --create mode and readable in --select mode.
-r, --database-root=,path/
directory containing directories containing the osl tables (default=`/var/lib/adu')

Base path to the directory containing the osl tables. The real database-dir is generated by appending base-dir. This directory is created if it does not exist. When used in select mode you have to specify the base-dir as well.

Modes:


Group: mode

adu may be started in one of three possible modes, each of which corresponds to a different command line option. Exactly one of these options must be given.
-C, --create
Create a new database

Traverse the given directory and track disk usage on a per-user basis. Results are stored in N + 1 osl tables where N is the number of uids that own at least one regular file in that directory.
-I, --interactive
activate interactive mode

In this mode, adu reads commands from stdin. This makes it possible to run different select queries without opening the underlying osl database for each query (which is expensive).

In interactive mode, several subcommands are available, see the end of this document.
-S, --select
query a database previously created with --create

This option prints statistics about matching subdirectories to stdout, to an output file or pipes the output to a given command, depending on the --output option. The output format can be customized by specifying select options, see below.
 

Options for --create:

-b, --base-dir=,path/
directory to traverse

The base directory to be traversed recursively. A warning message is printed for each subdirectory that could not be read because of insufficient permissions. These directories will be ignored when computing statistics.
-x, --one-file-system
do not dive into other file systems (default=off)

Skip directories that are on different file systems from the one that the argument being processed is on.
--hash-table-bits=,num/
specify the size of the uid hash table (default=`10')

Use a hash table of size 2^num for the uid entries. If more than 2^num different uids own at least one regular file under base-dir, the command fails. Increase this value if you have more than 1024 users. Decreasing the value causes adu to use slightly less memory.
-B, --bloom-filter-order=,order/
use bloom filters for hard link detection

(default=`23')

Allocate bloom filters of size 2^order bits. Each regular file with hard link count greater than one is added to these filters which allows to detect hard links on a per-user basis. Greater values reduce the probability of false positives but require more memory.

Values less than 10 deactivate this feature so that no hard links are being detected.
-N, --num-bloom-filter-hash-functions=,num/
number of hash functions for the bloom filters

(default=`10')

Cause each entry which is added to the bloom filter to set "num" bits of the bloom filter.
 

Options for --select:

-s, --select-options=<options>
Options for select mode

This option takes a string whose content is another set of options as described below. Select options may be specified either directly in select mode, in which case you have use quotes to prevent the select options from being interpreted as adu options, or via the "set" command in interactive mode.
 

Select options:

-h, --help
Print help and exit
--detailed-help
Print help, including all details and hidden options, and exit
-V, --version
Print version and exit
-u, --user=,user_name/
users to take into account

This option may be given multiple times in which case all given user names are considered admissible. See also --uid below.
-U, --uid=,uid_spec/
user id(s) to take into account

An uid specifier may be a single uid, a range of uids, or a comma-separated list of single uids or ranges. Example:

Only consider uid 42:

--uid 42

Only consider uids greater or equal than 42:

--uid 42-

Only consider uids between 23 and 42, inclusively:

--uid 23-42

Consider uids 23-42, 666-777 and 88:

--uid 23-42,666-777,88

If no --user option is given and also --uid option is not given (the default), all users are taken into account.
-l, --limit=,num/
Limit output (default=`-1')

Only print num lines of output. If negative (the default), print all lines. This option is honored in all select modes except global_summary (which outputs only one single line).
-p, --pattern=,regex/
only consider matching directories

Regular expression that must match the directory name for the directory to be considered for the output of the query. See regex(7) for details.

Depending on whether --print-base-dir is given, the absolute directory name or only the part of the directory name below the base directory is matched against "regex".

If this option is not given (the default) all directories are taken into account.

If "regex" starts with '!', directories are matched against the remaining part of "regex" and the sense of matching is reversed.
-H, --header=,string/
use a customized header for listings/summaries

This option can be used to print any string instead of the default header line (which depends on the selected mode).

In user_list mode the header is a format string which allows to include the uid and the user name in the header. See the --format option for more details.

It is possible to set this to the empty string to suppress the header completely. This is mostly useful to feed the output to scripts.
-T, --trailer=,string/
use a customized trailer for listings/summaries (default=`')

This option can be used to print any string at the end of the query output.

In user_list mode the trailer is a format string with the same semantics like the header string.
-m, --select-mode=<key>
How to print the results of the query (possible values="user_summary", "user_list", "global_summary", "global_list" default=`user_summary')

user_summary: Print totals for each admissible uid. user_list: Print a list for each admissible uid. global_summary: Only print totals. global_list: List of directories, regardless of owner.
-s, --list-sort=<key>
how to sort the user list or the global list (possible values="size", "file_count" default=`size')

This option is ignored if select-mode is neither "user_list", nor "global_list".
-o, --output=,path/
file to write output to (default=`-')

This option is only useful in interactive mode. If stdin is redirected from a script, and the script contains several queries one can use this option to let each query write its output to a different file.

If the option is not given, or its argument is either "-" or the empty string, stdout is assumed. The following conventions cause the output to be written in a different way:

"path" may be prepended by '>' which instructs adu to truncate the output file to length zero. If "path" does not start with '>' and "path" already exists, the query is aborted. Otherwise, the file is created and truncated. The output file name ">" is considered invalid.

If the first two characters of "path" are '>', the output file (given by removing the leading ">>" from "path") is opened in append mode. It is no error if the output file does not exist. However, as above the output file name ">>" is considered invalid.

If the first character of "path" is '|', a pipe is created and the rest of "path" is executed with stdin redirected to the reading end of the pipe while the query output is written to the writing end of the pipe. Again, specifying only "|" is considered invalid and causes an error.

See the manual page for examples.
--user-summary-sort=,col_spec/
how to sort the user-summary (possible
values="name", "uid", "dir_count",
"file_count", "size" default=`size')

It is enough to specify the first letter of the column specifier, e.g. "--user-summary-sort f" sorts by file count.
--print-base-dir
whether to include the base-dir in the output (default=off)

If this flag is given, all directories printed are prefixed with the base directory. The default is to print paths relative to the base dir.
-f, --format=<format_string>
How to format the output

A string that specifies how the output of the select query is going to be formated. Depending on the chosen select-mode, several conversion specifiers are available and a different default value for this option applies.

adu knows four different types of directives: string, id, count and size. These are explained in more detail below.

The general syntax for string and id directives is %(name:a:w) where "name" is the name of the directive, "a" specifies the alignment and "w" is the width specifier which allows to give a field width.

The alignment specifier is a single character: Either "l", "r", or "c" may be given to specify left, right and centered alignment respectively. The with specifier is a positive integer. Both "a" and "w" are optional.

One string directive supported by adu is "dirname" which is substituted by the name of the directory. It is available if either user_list or global_list mode was selected via --select-mode.

Examples:

Print dirname without any padding:

"%(diname)"

Center dirname in a 20 chars wide field:

"%(dirname:c:20)"

The count and size directives are used for non-negative numbers. The syntax for these is %(name:a:w:u). The "a" and the "w" specifiers have the same meaning as for the string and id directives. The additional "u" specifier selects a unit in which the number that corresponds to the directive should be formated. All three specifiers are optional.

Possible units are the characters of the set " bkmgtBKMGT" specifying bytes, kilobytes, megabytes, gigabytes and terabytes respectively. The difference between the lower and the upper case variants is that the lower case specifiers select 1024-based units while the upper case specifiers use 1000 as the basis.

The whitespace character is like "b", but a space character is printed instead of a unit.

Two more characters "h" and "H" (human-readable) are also available that choose an appropriate unit depending on the size of the number being printed.

An asterisk prepended to the unit specifier prevents the unit from being printed. This is useful mainly for feeding the output of adu to scripts that do not expect units.

In order to print a percent sign, use "%%". Moreover, adu understands "\n" and "\t" and outputs a newline and a tab character for these combinations respectively.

Examples:
Print size in gigabytes right-aligned:
"%(size:r::G)"
As before, but use 5 char wide field:
"%(size:r5::G)"
As before, but suppress trailing "G":
"%(size:r5::*G)"

The following list contains all directives known to adu, together with their types, and for which modes each of them may be used.

pw_name (string): user name. Available for user_list, user_summary and for the header in user_list mode.

uid (id): user id. Available for user_list, user_summary and for the header in user_list mode.

files (count): number of files. Available for all modes.

dirname (string): name of the directory. Available for user_list and global_list.

size (size): total size/ directory size. Available for all modes.

dirs (count): number of directories. Available for user_summary and global_summary.
 

Interactive commands:

set
change the current configuration
reset
reset configuration to defaults
help
show list of commands and one-line descriptions
run
start the query according to the current configuration
source
read and execute interactive commands from a file
 

EXAMPLES

The following example creates a database containing the disk usage patterns of the /var directory:


       $ adu --create --database-dir /root/adu-var --base-dir /var

Here's a simple query that uses the newly created database to print the user-summary:


       $ adu --select --database-dir /root/adu-var

To print the one-line global summary instead, use


       $ adu --select --database-dir /root/adu-var --select-options '--select-mode global_summary'

To sort the user summary by file count rather than by file size, run


       $ adu --select --database-dir /root/adu-var --select-options '--list-sort=file_count'

The command below prints the five largest directories of the users root and mysql:


       $ adu --select --database-dir /root/adu-var --select-options '--select-mode user_list --user root --user mysql --limit 5'

The same, using short options:


       $ adu -Sd /root/adu-var -s '-m user_list -u root -u mysql -l 5'

Again the same, but omitting /var/cache:


       $ adu -Sd /root/adu-var -s '-m user_list -u root -u mysql -l 5 -p !^cache/'

A simple script for interactive mode:

       set -m user_list
        set -u root
        set -o file-list.root
        run
        reset
        set -m user_list
        set -u mysql
        set -o file-list.mysql
        run

Run adu in interactive mode with the above script (adu-script.txt):


       $ adu -Id /root/adu-var < adu-script.txt
 

SEE ALSO

du(1)