DUE: Monday 3/31 11:59pm
HOW TO SUBMIT: Submit the required files for all problems (see WHAT TO SUBMIT under each problem below) through WebSubmit. On the WebSubmit interface, make sure you select compsci290
and the appropriate homework number. You can submit multiple times, but please resubmit files for all problems each time.
WHAT TO SUBMIT: Nothing is required for this part.
Start up your VM and issue the following commands, which will download the necessary files and create a working directory for your assignment:
/opt/datacourse/sync.sh
cp -pr /opt/datacourse/assignments/hw11-template/ ~/hw11/
cd ~/hw11/
In this exercise you will edit the sample.py
scripts to implement different graph sampling routines. You will then use graph.py
to study the sampling distributions of the following three statistics (see lecture notes for definitions).
Calculating betweenness on the full, unsampled graph can take quite a while, so we've calculated the unsampled statistics for you:
Implement the following two sampling techniques discussed in class -- random_edge
and random_jump
. We have already implemented random_node
sampling for you in sample.py
. You can switch the sampling function used by editing graph.py
as follows (as long as the sampling functions take the same required arguments):
from sample import random_jump as sample_function
from sample import random_edge as sample_function
Note: Remember to run graph.py by using /usr/bin/python graph.py characters-fallback.tsv
.
This is exactly as it sounds: we randomly select an edge from the graph, and repeat.
HINT: Your sample size will likely have to be much larger than random_node.
Random jump sampling is a variant on random walk sampling. The idea is:
out_neighbors()
, and continueYour sample.py file with the above sampling techniques. In addition, a 1.txt file that answers the following questions: