I was reading the LJ 150 and at a certain moment, during reading the article from Dave Taylor named “Analyzing Log Files” I noticed some processor consuming order in his examples.
At a certain moment he wrote the next commandprompt to search for HTML files in the access_log:
awk ‘{ print }’ access_log | sort | uniq -c | sort -rn | grep “.html” | head
This command consumes at my system:
real 0m0.097s
user 0m0.084s
sys 0m0.020s
If you put the grep command right after the awk command
awk ‘{ print }’ access_log | grep “.html” | sort | uniq -c | sort -rn | head
the filter consumes:
real 0m0.042s
user 0m0.028s
sys 0m0.012s
The reason why this is faster can be explained that in the first filter you will sort first the whole dataset and after that you remove with grep the non .html entries. The second one (the one I suggest) removes first all the non .html entries and will sort it afterwards.