Proxycizer

Proxycizer is a suite of applications and C++ classes that can be used in simulating and/or driving web proxies. Proxycizer can automatically recognize several types of proxy traces, currently including Harvest, Squid, DEC, UCB, WorldCup98, CRISP, Crispy Squid, and some other Proxycizer-specific formats.


Download

You may download the latest version of Proxycizer:


Source documentation

There is some source documentation available:


Tested Platforms

This software has been known to compile and run as is on:

Other platforms such as mips-sgi-irix6.2 [CC -32, CC -64] will compile with slight modifications. If you have been able to compile and run this software on other platforms/systems please let me know.

There are still bugs in this software, though they are disappearing pretty quickly. If you find one, let me know (fixes appreciated, too!).


Requirements

Compiling this package requires a good C++ compiler (g++ works well). In addition, some programs may require other utilities. The package will automatically detect those libraries that are missing and will not compile those dependent programs. Here's more info on why you might need them and how to get them if you don't have them already:

GNU regex

http://www.gnu.org/

If you don't have POSIX regex stuff in your C library (i.e. the regcomp, regexec functions), you need GNU's regex library. You may wish to get it anyway; some vendor-specific implementations are horrendously slow.

db
http://www.sleepycat.com/db/

db is used to store traces in a disk database for efficient retrieval of log entries indexed by url. webulator and ProxyDB can take advantage of this library.

TPIE
http://www.cs.duke.edu/~tpie/

TPIE is an I/O library used to transparently perform out-of-core computation on arbitrary data. Since many proxy traces (when uncompressed) don't fit in physical memory, sortproxytrace uses TPIE to enable sorting of large traces.

Perl
http://www.perl.com/ or http://www.perl.org/

The original versions of most of these tools were written in Perl. Some vestiges are left in the scripts directory, but not many.


Compiling

To make the entire package, first create a directory for building, for example in the top level Proxycizer directory:

  mkdir `./config.guess`
This will create a new directory such as i386-unknown-freebsd4.0. Now change to that directory:
  cd i386-unknown-freebsd4.0
and within that directory do
 ../configure
This will test aspects of your system that Proxycizer depends on, and will create the necessary Makefiles. Typing
  make
should compile this package, assuming your make understands VPATH, such as all good recent makes. If your make doesn't succeed, try gmake, and if all else fails, mail me.

If you wish to add libraries (such as TPIE) that ./configure can't find in the standard places, add them using the options to ./configure:

--with-addprefix=... specify additional search prefixes, separated
                     by ':' (colon); will add -I<prefix>/include and
                     -L<prefix>/lib to compile and link commands

See all options to ./configure by typing ./configure --help

This package contains many example programs, and statically linking libproxycizer.a into all of them can take up lots of disk space. If you know how to make shared libraries on your system, you can modify the Makefiles to produce one for all the Proxycizer applications.


Some of the programs in this package

this list is out of date...
Trace characterization/simulation
Name Required/optional extras Description
clientrates none Calculates rate distributions of clients in the trace (e.g. what is the max/average clientrate, mass function, etc.).
countclients none Just counts the number of clients in the trace.
runproxies none Simulates various proxy types and configurations, runs a trace through them, and reports hit ratios and other stats for all proxies.
distances none Weirdly named utility that can simulate a large number of proxy configurations and cache sizes. Can also determine "distances" (time elapsed) between different classes of references to the same object.
objectsizes none Calculates distribution on the sizes of objects in the trace, similar to 'clientrates'.
Proxy drivers
Name Required/optional extras Description
simclient none A non-blocking (asynchronous I/O) client simulator used to stream requests to a http proxy. Can model the real-time request stream given in a proxy trace, with the maximum number of simultaneous connections limited by the maximum number of open file descriptors available to a process.
webulator db [opt.] A non-blocking http proxy that returns objects (consisting of '*' characters) on request. Can return dummy objects, all of a uniform size, or, can return objects based on info from a database of a type generated by 'proxylog2db' (i.e. using the length field to determine length of object). If you don't have db, you can only use a database of fixed-size log entries (like DEC logs, for example). As of version 0.9, webulator can also model delays given in the trace.
Miscellaneous utilities
Name Required/optional extras Description
calchits Perl Auxiliary program to 'correlate', prints info on the data contained in a 'correlate' output file. In scripts/ directory.
correlate none Takes 'simclient' logs, Harvest or CRISP access and hierarchy logs, (handles logs from multiple proxies) and produces a one line summary of every request, e.g. what path a request took through the cache(s), whether it hit or missed, etc.
filterlogs none Filter logs through infinite per-client caches, and write the resulting stream (i.e. the remaining requests). ('distances' can do this on the fly, at the cost of increased memory usage and longer running time.)
log2txt none Generate a limited text representation of a trace, one entry per line.
probedbfile db Simple, fragile, utility to probe a dbfile of the type generated by 'proxylog2db' for the given key. Assumes the database contains DEC proxy trace entries.
proxylog2db db [opt.] Reads in a trace, and inserts all entries into a database. If db is not available, you can only use traces with fixed-size log entries (like DEC traces).
sortproxytrace TPIE Sorts proxy trace file(s) on any of a set of user-specified fields. Uses TPIE to enable sorting of very large traces.
splitlogs none Split logs by clients. For example, take every 16th client in the trace and write only their requests to an output file.

Syam Gadde
Last modified: Fri Apr 27 16:48:54 EDT 2001