Proxycizer 0.9 source code walk-through
The purpose of this document is to give an overview of the code layout and programming style of the package, and to make it easier for anyone (including myself) to extend it. If there isn't much documentation on this page for a particular class or program, chances are you'll find plenty of comments in the source, so please look there too. This document stays pretty high level, and describes the relationship between classes, not necessarily the implementation or interfaces themselves.
The code is split between two directories:
- libsrc, which contains the code for the Proxycizer libry, containing nearly all classes used in the package, and
- progsrc, which contains the stand-alone programs' top-level procedures (such as main).
libsrc/
Proxy classes
Not referring to the Proxy Class pattern, but to a set of classes that simulate web proxy-ish behavior.
- libsrc/Proxy.h
- All classes below inherit from the abstract base class Proxy, which mainly exports Proxy_request and Proxy_response types, and a Proxy::Request() method to accept requests. The Proxy class itself does nothing. It merely provides a common interface; derived classes must do all the work. A ProxyStats object is included in this class; most derived classes add their own stats to this so just calling the ProxyStats::Print() method on it will print out stats for all superclasses of the object.
- libsrc/ProxyCache.{h,cc}
- Sticks a basic cache with LRU replacement onto the proxy.
- libsrc/ProxyCacheQueryable.{h,cc}
- Derived from ProxyCache, this class adds a querying mechanism (using the Query method), so that other caches may find out if this cache is holding a specific object.
- libsrc/ProxyDB.h
- libsrc/ProxyCRISP.{h,cc}
- libsrc/ProxyHarvest.{h,cc}
- libsrc/ProxyRD.{h,cc}
- libsrc/ProxyRPDSD.{h,cc}
- libsrc/ProxyRPDSDCS.{h,cc}
- libsrc/ProxyRPDSDMS.{h,cc}
- The "bottom-level" (final, to use a Java term) classes that complete the implementations of specific proxy simulators. ProxyDB is a special class that uses a db file to return the correct sized object.
Trace readers
These classes are used to iterate over traces.
- libsrc/TraceReader.{h,cc}
- This base class takes care of opening files/pipes and providing iterator methods. It should templated on a type (typically derived from LogEntry) that provides the method GetOps(), which should return a pointer to a new object of type LogEntryOps. If constructed using a filename, it will also accept compressed files (assuming the file has an extension .gz).
- libsrc/TraceReaderSequence.{h,cc}
- Use this class to treat several proxy trace logs as one large (concatenated) log. The TraceReaderSequence::Open() method can be called more than once to set up a series of log files. TraceReaderSequence is a TraceReader, and acts just like one. Even First() works correctly.
- libsrc/BufferedTraceReader.{h,cc}
- BufferedTraceReader keeps a buffer cache of entries while walking through the log. The "B-" versions of the iterator methods add elements to the buffer cache, and allow the user to traverse the buffer cache, reaccessing previously seen elements. Elements are deleted with BufferedTraceReader::BDelete; any element in the buffer cache can be deleted using this method.
Proxy trace writers
- libsrc/TraceWriter.{h,cc}
- This is fairly self explanatory. Similar to TraceReader, but exports a Write() method to write a entry (of templated type, usually derived from LogEntry) to a file.
Log entry classes
The opening of files is done by the TraceReader classes described previously, but the reading and parsing of a log entry is delegated to these classes and their corresponding -Ops classes. The data classes are all derived from LogEntry. They represent the data contained in one log entry. The operation classes are all derived from LogEntryOps. These define a mechanism for reading/writing their corresponding data class from/to their external representations via a file/pipe/stream/buffer. A LogEntryOps class exports the following methods:
All "read" methods return a new LogEntry in *lepp if *lepp == NULL. Otherwise they assume that *lepp points to enough storage to hold the resulting LogEntry. UberLogEntrys are a special case. See UberLogEntry for details.
- ReadFirst(LogEntry **lepp, const void *data, size_t * len)
- Read(LogEntry **lepp, const void *data, size_t * len)
- ReadFirst(LogEntry **lepp, FILE *fp)
- Read(LogEntry **lepp, FILE *fp)
- ReadFirst(LogEntry **lepp, istream & is)
- Read(LogEntry **lepp, istream & is)
- WriteFirst(const LogEntry *lep, void *data, size_t * len)
- Write(const LogEntry *lep, void *data, size_t * len)
- WriteFirst(const LogEntry *lep, FILE *fp)
- Write(const LogEntry *lep, FILE *fp)
- WriteFirst(const LogEntry *lep, ostream & os)
- Write(const LogEntry *lep, ostream & os)
Each TraceReader will only use one LogEntryOps object, which may construct and allocate many LogEntrys. This enables the "long-lived" LogEntryOps object to keep state through iteration over the many LogEntrys in a file. See UberLogEntry for an example.
- libsrc/Mergeable.h
- libsrc/LogEntry.h
- libsrc/LogEntryText.{h,cc}
These are the abstract LogEntry classes. LogEntry inherits a comparison operator operator <(const Mergeable &) from Mergeable, which uses the _orderval member to enable any set of objects in any of these classes to be ordered (by time, by default).
LogEntryText and LogEntryTextOps provide some common infrastructure for reading text-based log files (meaning all but DEC and UCB log files).
- libsrc/LogEntryCrispySquidAcc.{h,cc}
- libsrc/LogEntryDEC.{h,cc}
- libsrc/LogEntryHarvestAcc.{h,cc}
- libsrc/LogEntryHarvestHier.{h,cc}
- libsrc/LogEntrySimclient.{h,cc}
- libsrc/LogEntrySquidAcc.{h,cc}
- libsrc/LogEntryUCB.{h,cc}
These files contain the data and operation classes for each supported log type.
- libsrc/UberLogEntry.{h,cc}
The UberLogEntryOps class attempts to automatically determine what type of trace it is looking at during the first read attempt, and generates the correct type of UberLogEntry-derived log entry objects. If it can't determine the type from the first 8K of the file, it bails with an error message. This class can usually detect Squid (Crispy or not) access logs, Harvest (Crispy or not) access/hierarchy logs, simclient logs, UCB Home-IP logs, and defaults to DEC logs when all else fails.
The "read" methods in UberLogEntryOps work a little differently than most LogEntryOps classes. UberLogEntrys store a pointer to a "real" LogEntry of the type automatically determined on the first read. If the LogEntry **lepp argument in a "read" method points to NULL, a new "real" LogEntry is allocated. However, if it points to valid storage, then that storage is used for the "real" LogEntry, not for the enclosing UberLogEntry. A new UberLogEntry is allocated in every case, and is returned in *lepp. So the value of *lepp always changes on success.
For those that wish to specify the storage for the UberLogEntry, too, the copy(void * uledata, void * realdata) method will copy the current UberLogEntry into storage pointed to by uledata and the encapsulated "real" LogEntry will be copied into storage pointed to by realdata. copy() assumes that the pointers are valid and that the storage they point to is large enough to store the resulting data structures. See sortproxytrace.cc for an example of why this may be useful.
Statistics
- libsrc/Stats.h
Based on Jason Kastner's perl module Statistics::Descriptive.pm, this class accepts data and will spit out various stats on demand. Stats throws away data after it has calculated and saved the info that it needs. FullStats, however, saves the given data, and enables other statistics, such as Median, Mode, and histograms.
- libsrc/ProxyStats.h
This class is used by the Proxy class and its subclasses. It provides a way to store (groups of) counters, and a Print() function to print them out nicely. Counters in the same "bracket" (group) are printed separately from other brackets, and counters are also labeled with their fraction (percentage) of the sum of all the other counters in the bracket.
Miscellaneous utilities
Many of these classes are provided courtesy of Owen Astrachan. No doubt, a few of these have gone through enough modification to be totally unrecognizable to him.
- libsrc/Pair.h
- libsrc/LList.{h,cc}
- libsrc/DLList.{h,cc}
- libsrc/Table.h
- libsrc/HTable.{h,cc}
- libsrc/Iterator.h
- libsrc/HIterator.{h,cc}
- libsrc/Vector.h
Miscellaneous utilities, 2ème Partie
- libsrc/EventManager.{h,cc}
Used by simclient and webulator, this class provides an event-driven infrastructure, where events can be triggered by file events or at given times.
- libsrc/DNSCache.{h,cc}
Used by some Proxy classes because some systems don't provide DNS caches of their own, and generate DNS queries on every gethostbyname().
- libsrc/Allocator.h
Templated type-specific memory allocator/deallocator.
- libsrc/tentry_v2.{h,cc}
A wrapper for libsrc/proxytrace2txt_v2.h
- libsrc/exiterr.h
An exit() that returns the line number as the error code.
- libsrc/vtoh.h
Little-endian (VAX) to host byte-order functions.
Currently unused classes
- libsrc/unused/Merger.cc
- libsrc/unused/Merger.h
- libsrc/unused/MergeableStream.h
- libsrc/unused/Stream.h
Syam Gadde Last modified: Fri Jun 11 15:47:52 EDT 1999