Create a filter to pull e-mail addresses out of STDIN


Client asked me to pull all e-mail addresses out all messages in his sent folder on the mail server. My first attemp was:

grep -i '^to:' * | cut -d':' -f4 | sort | uniq > ~/mail_addresses.out

which actuall pulled too much (the cut was an attempt to get the address out). It also duplicated based on case (I realize uniq can do case insensitive) and was limited by the actual format of the out put line; if it changed, I was screwed.

 

I realized what I needed was a filter that would pull only the e-mail addresses out of a line. Thanks to the site http://www.regular-expressions.info I was able to create a simple perl script that did just that AND it would (theoretically) pull multiples off of one line AND I could convert everything to lower case (did not test pulling multiples off of a line).

 

The script is:

#! /usr/bin/perl -w
# simple filter that will take STDIN, look for any e-mail addresses # then print the lower case equivilent to STDOUT. # # Regex compliments of http://www.regular-expressions.info/email.html while ( $line = <> ) {
   while ( $line =~ m/(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b)/gi ) {
      print lc $1 . "\n";
   } } 1;

 

Example usage is:

grep -i '^to:' USER_MAIL_DIR/.Sent/cur/* | ./parse_email_address.pl | sort | uniq > mail.out

That should get everything where the TO: was used. If you want everything, everywhere, change that to:

cat USER_MAIL_DIR/.Sent/cur/* | ./parse_email_address.pl | sort | uniq > mail.out

 

I could have created something that did it all, but decided a small filter was more Unix like, and would be more useful for additional projects in the future.

Attached files: parse_email_address.pl

Last update:
2013-06-13 01:17
Author:
Rod
Revision:
1.1
Average rating:0 (0 Votes)

You cannot comment on this entry

Chuck Norris has counted to infinity. Twice.

Records in this category

Tags