Investigate disk usage

Recently I had to investigate the usage of a very big volume... with a lot of data-files, owned by several users.


I started with "agedu", but somehow I was not able to get the information I needed.. so I started using find with stat and put everything into MySQL.


So the first step was to do the following find command:


# find /nfs/bigfiler -exec stat --format="%F:%n:%s:%U:%u:%G:%g:%X:%Y:%Z" {} ; > /scratch/big-filer.info


Create a table in MySQL:



CREATE TABLE `pieter.bigfiler_content` (
`fileid` int(10) unsigned NOT NULL AUTO_INCREMENT,
`file_type` char(32) NOT NULL,
`filename` varchar(512) NOT NULL,
`size` int(11) NOT NULL,
`user` char(16) DEFAULT NULL,
`uid` char(16) DEFAULT NULL,
`groupname` char(16) DEFAULT NULL,
`gid` char(16) DEFAULT NULL,
`time_access` datetime DEFAULT NULL,
`time_mod` datetime DEFAULT NULL,
`time_change` datetime DEFAULT NULL,
PRIMARY KEY (`fileid`),
KEY `idx_file_type` (`file_type`) USING BTREE,
KEY `idx_user` (`user`) USING BTREE
)


Load the data into MySQL:



LOAD DATA INFILE '/scratch/big-filer.info'
INTO TABLE pieter.bigfiler_content
FIELDS TERMINATED BY ':'
LINES TERMINATED BY 'n'
(file_type, filename, size, user, uid, groupname, gid, @time_access, @time_mod, @time_changed)
SET time_access = FROM_UNIXTIME(@time_access),
time_mod = FROM_UNIXTIME(@time_mod),
time_change = FROM_UNIXTIME(@time_change);



And now you can run nice queries to analyse the data :D