Quick Bar-Chart of disk usage

Today I was in search of a command that I had used a long time ago, but ran into a much more interesting one instead.  At the time, I must have been needing to discover what files were the largest disk hogs and if there was a long tail (i.e. how many of the 3.7M files in this directory–not my fault, by the way–were inconsequential).  That brings us to this wonderful “one-line” command:

find /dir/ -name "*.xml" -exec du -s {} ; | perl -ni -e 'if (/^(d+)s+(.*)/) { $h{$2} = $1; if ($max < $1) { $max = $1; } if (length($2) > $maxfname) { $maxfname = length($2); } } END { map { $barlen = ($h{$_} / $max) * 50; $bar = "*" x $barlen; printf ("%" . $maxfname . "s" . "(%5d): %s", $_, $h{$_}, $bar); print "n"; } sort { $h{$b} <=> $h{$a} } keys %h }' 2> /dev/null > report.txt

What that specifically does is to find every XML file in the dir directory, use the linux du command to get the file’s size.  That list of filenames and sizes is passed to a hacky perl script that pulls out the size, creates a horizontal histogram bar based on the max size (limit 50 *s wide), sort and return the list from max to min.  Lastly, that’s saved to report.txt.

That’s quite a quick and dirty trick, but produces a nice command-line output like this:

/dir/w6bz9whg.xml(36560): **************************************************
/dir/w6km312r.xml(31772): *******************************************
/dir/w68d03gz.xml(27728): *************************************
/dir/w6vt5fhv.xml(27076): *************************************
/dir/w6m07v80.xml(17420): ***********************
/dir/w68m0zj8.xml(15276): ********************
/dir/w6mq7qpz.xml(15052): ********************
/dir/w6vq30tq.xml(13808): ******************
/dir/w6tb51hr.xml(13160): *****************
...

 

Command Line Tricks

So, I always am using some command line shortcuts to do various tasks, and often have to look up the tricks every time I need to do something remotely fancy.  Here are some of my most-used helpful hints:

  • To remove the leading spaces and tabs from each line of text on standard in (so use with a pipe for the input), this sed command will work well:
    sed -e 's/^[ \t]*//'
  • Reformatting XML/HTML files so that line returns inside tags are removed:
    xmllint --format --noblanks infile.xml > outfile.xml

Command Line Master

Wanted to post the craziest command line script I’ve used in a long time.  Used to convert names listed in XML tags in an EAC-CPF record to filenames to copy.

grep -h -o -P "<relationEntry>(.*?)</relationEntry>" *.xml
 | sed -e 's/<[a-zA-Z0-9\/\+]*>//g'
 | awk '{print tolower($0)}'
 | sed -e 's/[ ,.\(\):]\+/\-/g'
 | sed -e 's/$/cr.xml/g'
 | while read x ; do cp /data/production/data/$x eac_data/. ; done