Lab #2: More Relational Data Processing

DUE: Monday 1/26 11:59pm

HOW/WHAT TO SUBMIT: All files should be submitted through WebSubmit. Only one of your team members needs to submit on behalf of the team. On the WebSubmit interface, make sure you select compsci216 and the appropriate lab number. You can submit multiple times, but please have the same team member resubmit all required files each time. To earn class participation credit, submit a text file team.txt listing members of your team who are present at the lab. To earn extra credit for the lab challenge (Part 2), you must get your solution checked off in class, but you don't need to submit anything.

0. Getting Ready

This lab will be about the congress database that you worked with for Homework #2. Assuming that you have set up Homework #2 correctly, there is nothing more to do.

1. A Few More Tips

(Here, we assume that you haven't created indexes by yourself with CREATE INDEX.)

Remember the following query in Lecture #2 for computing the total duration of a person's service in the Congress? Run it on all legislators (past and present) and note how slow it is:

SELECT id, first_name, last_name, birthday,
       (SELECT SUM(end_date - start_date)
        FROM person_roles r
        WHERE r.person_id = p.id) AS duration
FROM persons p;

Now, issue the following SQL command:

CREATE INDEX ON person_roles(person_id);

Next, re-run the same SQL query above. (In psql, you can just press [UP] a couple of times to recall the previous commands.) See how fast it now runs? Appreciate what an index can do for you! Can you explain (intuitively) how the new index helps with this query? (Free feel free to create additional indexes to help with query performance for the reminder of this lab.)

Another trick: for a complex query, you can use the WITH clause to define a sequence of temporary views that each build onto other temporary views. Here is the syntax:

WITH tmp_view_1 AS
  (... query that defines tmp_view_1 ...),
tmp_view_2 AS
  (... query that defines tmp_view_2 ...),
tmp_view_3 AS
  (... query that defines tmp_view_2 ...)
-- your actual query then starts below:
SELECT *
FROM tmp_view_3
WHERE ...;

2. [Challenge] Fact-Checking using a Relational Database

For each of the "fact" below, use SQL queries to (1) verify whether it is technically correct, and (2) determine whether it is misleading. As soon as you finish checking one claim below, raise your hand to get your answered checked by the course staff.

(A) God Bless America Among all current members of the Congress, only 5 are Christians!

(B) John Kerry vs. Democrats In the 113th Congress (2013-2014), Senator John Kerry (D, MA) only agreed with his party leader Harry Reid (D, NV) a total of 4 times! In comparison, Kay Hagan (D, NC) agreed with Reid 575 times!

(C) Lucky 2/22 People born on February 22 are more likely to be elected to Congress than those born on any other day.

(D) Women on Abortion H. R. 7 of the 113th Congress is also known as the "No Taxpayer Funding for Abortion Act." (The bill was introduced in a previous session of the Congress and was passed by the House on Janary 28, 2014, but was never passed by the Senate.) Women were strongly against this bill---58 of them voted against it while only 19 of them voted for it.

Hint on locating the particular vote: Beware that each bill goes through a complex lifecycle with many votes related to it. In this case, we are looking specifically at a House vote (chamer='h') with type being "On Passage of the Bill" and question containing some variation of the string "H. R. 7" (e.g., 'HR7', 'H R 7', 'H. R. 7', etc.).

3. More Challenges If You Finish Early

This part is optional and purely for fun and practice---besides impressing the course staff you will not earn extra credits.

You might find the SQL construct NOT EXISTS useful. It is applied to a subquery and tests whether the subquery returns an empty result. For example, here is one way (instead of using MAX) to find the youngest person in persons:

SELECT * FROM persons p
WHERE birthday IS NOT NULL
AND NOT EXISTS (SELECT * FROM persons WHERE birthday > p.birthday);

(A) Find, over time and year by year, the ratio of females vs. males in House and Senate.

(B) For House and for Senate, find the longest streak of consecutive years when the percentage of females strictly increased year after year.

WHAT TO SUBMIT: Files 3A.sql and 3B.sql containing your queries, and text files 3A.out and 3B.out containing the query resluts. To explain your queries, you may put comments in SQL statements using --: everything on the same line followin -- will be ignored.