Module WordFrequency '***************************************************************************** '***************************************************************************** '** Word Frequency project '** '** Written in response to Owen Astrachan's SIGCSE assignment, this program '** takes as command line input the name of a text file, reads the file, and '** counts each occurance of the white-space deliniated text. It then produces '** a report (to the file output.txt) of the word frequencies, from highest to '** lowest. '** '** After considering a number of approaches, I wrote this in VB.NET (2003), '** using several APCS techniques that I teach. For example, I used a '** Binary Search Tree to store the individual words, rather than a simple '** array. The overall efficiency of this code should be O(n*n) The BST is '** O(n*log n), but the merge into the LinkedList is O(n*n). I suppose that '** using another BST could improve the efficiency to O(n*log n), but would '** have required writing yet-another BST class. '** '** Overall, a lot of fun. '** '** Tom Indelicato 3/22/07 '***************************************************************************** '***************************************************************************** Private Class BinaryTree '***************************************************************************** '** BinaryTree Class '** '** An enhancement of the standard class to track occurances if words. '** Duplicates result in an increment of a Count field (initially set to 1). '** '** Also contains Resort (and a recursive ResortHelper) that extracts '** tabulated data via an InOrder traversal, and addes each word and its '** occurance count to a LinkedList, defined below. '** '** The private class BinaryTreeNode is used to store both the word and its '** occurance count. '** '** Tom Indelicato 3/21/07 '***************************************************************************** Private Root As BinaryTreeNode Public Sub New() Root = Nothing End Sub Public Sub Add(ByVal Word As String) If Root Is Nothing Then Root = New BinaryTreeNode(Word) Root.Left = Nothing Root.Right = Nothing Else Root.AddWord(Word) End If End Sub Public Sub Resort() ResortHelper(Root) End Sub Private Sub ResortHelper(ByVal Here As BinaryTreeNode) If Here Is Nothing Then Return ResortHelper(Here.Left) LL.add(Here.Count, Here.Word) ResortHelper(Here.Right) End Sub Private Class BinaryTreeNode '***************************************************************************** '** BinaryTreeNode Class '** '** An enhancement of the standard class to track occurances if words. '** Duplicates result in an increment of a Count field (initially set to 1). '** '** Tom Indelicato 3/21/07 '***************************************************************************** Private mWord As String Private mCount As Integer Private mLeft As BinaryTreeNode Private mRight As BinaryTreeNode Public Sub New(ByVal Value As String) Word = Value Count = 1 Right = Nothing Left = Nothing End Sub Public Property Word() Get Return mWord End Get Set(ByVal Value) mWord = Value End Set End Property Public Property Count() Get Return mCount End Get Set(ByVal Value) mCount = Value End Set End Property Public Property Right() As BinaryTreeNode Get Return mRight End Get Set(ByVal Value As BinaryTreeNode) mRight = Value End Set End Property Public Property Left() As BinaryTreeNode Get Return mLeft End Get Set(ByVal Value As BinaryTreeNode) mLeft = Value End Set End Property Public Sub AddWord(ByVal Word) If mWord = Word Then mCount += 1 Else If mWord > Word Then If mLeft Is Nothing Then mLeft = New BinaryTreeNode(Word) Else mLeft.AddWord(Word) End If Else If mRight Is Nothing Then mRight = New BinaryTreeNode(Word) Else mRight.AddWord(Word) End If End If End If End Sub End Class End Class Private Class LinkedList '***************************************************************************** '** LinkedList Class '** '** An enhancement of the standard class to track occurances if words. Each '** node of the list (defined below) stores both a word and its occurance '** count. Additions to the list are sorted by occurance; duplicate '** occurrance counts are added to end of a given count range. (This is to '** allow the words, entered alphabetically, to remain in alphabetical order.) '** '** Also contains a Print method, which returns the required output in a String. '** '** The private class ListNode is used to store both the word and its '** occurance count. '** '** Tom Indelicato 3/21/07 '***************************************************************************** Private Root As ListNode Public Sub New() Root = Nothing End Sub Public Sub add(ByVal Count As Integer, ByVal Word As String) If Root Is Nothing Then Root = New ListNode(Count, Word) Else Dim NewNode As New ListNode(Count, Word) If Root.Count < Count Then NewNode.NextNode = Root Root = NewNode Else Dim Here As ListNode = Root Dim Prev As ListNode = Nothing Do While (Not (Here Is Nothing)) AndAlso (Here.Count >= Count) Prev = Here Here = Here.NextNode Loop NewNode.NextNode = Here Prev.NextNode = NewNode End If End If End Sub Public Function Print() As String Dim strOutput As String Dim Here As ListNode = Root Do While Not Here Is Nothing strOutput = strOutput & Here.Count & vbTab & Here.Word & vbNewLine Here = Here.NextNode Loop Return strOutput End Function Private Class ListNode Private mCount As Integer Private mWord As String Private mNext As ListNode Public Sub New(ByVal Count As Integer, ByVal Word As String) mCount = Count mWord = Word mNext = Nothing End Sub Public Property Word() Get Return mWord End Get Set(ByVal Value) mWord = Value End Set End Property Public Property Count() Get Return mCount End Get Set(ByVal Value) mCount = Value End Set End Property Public Property NextNode() Get Return mNext End Get Set(ByVal Value) mNext = Value End Set End Property End Class End Class Dim sr As System.IO.StreamReader Dim sw As System.IO.StreamWriter Dim BST As BinaryTree Dim LL As LinkedList Sub Main() InitializeStructures() System.Console.WriteLine("Word Frequency Report") System.Console.WriteLine() System.Console.WriteLine("Reading from file: " & Command()) System.Console.WriteLine("Writing to file: output.txt") ReadInputFile() WriteReport() System.Console.WriteLine("Done!") End Sub Private Sub InitializeStructures() sr = New System.IO.StreamReader(Command()) sw = New System.IO.StreamWriter("output.txt") BST = New BinaryTree LL = New LinkedList End Sub Private Sub ReadInputFile() Dim strTemp As String Dim I As Integer Do While sr.Peek > -1 strTemp = sr.ReadLine If strTemp.Length > 0 Then Dim strArr() As String = strTemp.Split For I = 0 To strArr.Length - 1 BST.Add(strArr(I).ToLower) Next End If Loop sr.Close() End Sub Private Sub WriteReport() BST.Resort() sw.Write(LL.Print) sw.Close() End Sub End Module