Return-Path: <teddy@aas.duke.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.7 (2006-10-05) on one.cs.duke.edu
X-Spam-Level: 
X-Spam-Status: No, score=-2.6 required=2.0 tests=AWL,BAYES_00 
	autolearn=unavailable version=3.1.7
Received: from compton.acpub.duke.edu (compton.acpub.duke.edu [152.3.233.74])
	by one.cs.duke.edu (8.14.0/8.14.0) with ESMTP id l2LM1jYk015132
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	for <ola@cs.duke.edu>; Wed, 21 Mar 2007 18:01:45 -0400 (EDT)
Received: from [152.3.146.202] (puffin.ewduke.duke.edu [152.3.146.202]) 
	by compton.acpub.duke.edu (8.12.11.20060308/8.12.10/Duke-5.0.0) with ESMTP id l2LM1grH025243;
	Wed, 21 Mar 2007 18:01:42 -0400 (EDT)
Message-ID: <4601AAB0.7070704@aas.duke.edu>
Date: Wed, 21 Mar 2007 17:59:12 -0400
From: Teddy Salazar <teddy@aas.duke.edu>
User-Agent: Thunderbird 1.5.0.10 (X11/20070304)
MIME-Version: 1.0
To: Astrachan Owen <ola@cs.duke.edu>
Subject: perl wordfreq
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Here are a couple perl tries at a wordfreq script.  I received the link 
from the DULUG list. I use perl mostly.
The first try uses -n in the shebang if that's OK.  If not, try #2 
below. Both are similar.
1.
#!/usr/bin/perl -n

map {$h{lc($_)}++} split/\s/, $_;

END {
    map { print "$h{$_}\t$_\n"} sort { $h{$b} <=> $h{$a} || $a cmp $b } 
keys %h
}
------------------------------------------------

2.
#!/usr/bin/perl

while(<>){
map { $h{lc($_)}++} split /\s/, $_;
}
map { print "$h{$_}\t$_\n"} sort { $h{$b} <=> $h{$a} || $a cmp $b } keys %h;

---------------------------------------------

Thanks for the work break.  It was better for me than a cookie.   I 
checked the output of both scripts against poe.out with diff and it was OK.

teddy
