CS+: CompSci Projects Beyond the Classroom

CS+ is a ten week summer program only for Duke undergraduates to get involved in computer science research projects with faculty in a fast-paced but supportive community environment. Students participate in teams of 3-4 and are jointly mentored by a faculty project lead and a graduate student mentor. The experience is meant as a rich entry point into computer science research and applications beyond the classroom.

Logistics:

  • The program will run for ten weeks from Tuesday, May 26, 2020 (the day after Memorial Day) through Friday, July 31, 2020.
  • Will be run virtually with the same basic logistics in terms of time commitment, stipend, dates, expectations.
  • Participants will receive a stipend of $5,000 to cover living expenses. 
  • Applications will open in December 2019 and go through February 2020. First round offers will begin late January/early February.

If you have questions about the program or issues with the application, please email csplus@cs.duke.edu.

CS+ admissions are now closed for Summer 2020. We have filled all our projects are no longer accepting new students.

FAQs:

What is the difference between Code+, Data+, and CS+?  All three “plus” programs have the same model: students collaborating in teams on a project in tech/data for the same 10 weeks of the summer and receiving a stipend of the same amount. We also partner to provide some common events (talks, social events, final poster fair, etc) in order to create a larger ecosystem of students studying in tech and data over the summer; over 100 students participated in 2019 across all three programs. Each program has their own application.

  • CS+ focuses on projects in computer science research and applications and is run by the Department of Computer Science. Project leads are typically computer science faculty.
  • Data+ focuses on interdisciplinary data science projects from all over the university, and is run by Rhodes I.I.D. in Gross Hall. Project leads are typically faculty from diverse areas of the university, with frequent additional participation from community and/or industry partners.
  • Code+ focuses on projects in software and product development and is run by Duke OIT taking place at the American Tobacco Campus in downtown Durham. Project leads are professional IT developers with the emphasis on developing real-world development experience. 

Do I apply to the program, or can I pick the projects I want to be a part of?  You can apply specifically to the projects and faculty of interest to you.

How much background do I need?  CS+ is intended for students who have some computer science experience, but students do not need to be computer science majors or rising seniors in order to apply. We welcome and encourage applications from rising 2nd and 3rd year students who have completed the introductory course sequence in computer science and have skills and interests that make them a good fit for their projects.

Summer 2020 Projects

Lead: Rong Ge

Description: Deep neural networks have significantly improved the performance of machine learning in many applications. However, there are still many challenges to understand the optimization of deep neural networks. The optimization landscape is very complicated, with potentially many local minima and saddle points. Further, even different global optimal solutions may have different performance, and in practice people rely on the right training algorithm to select a reasonable optimal solution. In this project, we try to investigate whether it is possible to make visualizations for the objectives of neural networks, and use these visualizations to make predictions. In particular, can we give a global visualization of an objective that can be used to prove the difficulty of optimization? Can we give a local visualization near a local optimal solution that allows us to predict its performance on the test data-set? Many of the attempts will be based on recent research on the mode connectivity of neural networks such as that by Draxler et al and Garipov et al, as well as a preliminary theoretical understanding by the faculty lead.

Outcomes: Students will implement deep neural networks on standard data-sets (e.g., CIFAR-10) and discover ways to summarize the properties of the complicated optimization landscapes using low dimensional visualizations. The final product will be the results of a systematic experimental investigation including the associated visualizations. The empirical observations from this project may help formalizing theoretical conjectures that help us understand optimization and generalization for deep neural networks theoretically.

Skills: Students should have some exposure to machine learning and college level mathematics, especially related to optimization like multivariable calculus or linear algebra. Experience with data visualization is a plus.

Lead: Joe Shamblin

Description: In this project, students will use a large dataset of bird images (iNaturalist dataset) to create an SSD (single-shot detection) multi-box detector using TensorFlow. After the model has been successfully created, trained, and validated students will use the model to identify individual bird species and estimate species populations captured using a camera attached to a single board computer like an Nvidia Jetson or Google coral. This project could be a valuable tool for bird groups around the country, e.g., the Audubon Society.

Outcomes: This project will involve significant applications of machine learning, computer vision, data processing, and population modeling. The final project will be presented on a website and will include a git repository with all code used in the project.

Skills: Students should have some exposure to Python, data parsing, and machine learning libraries. Math skills are also a plus.

Lead: Robert Duvall

Description: This project's goal is to create software to make it easy to combine any variety of input and output hardware, accessible using Open Source APIs, to detect multimodal behavioral cues and react accordingly. For example, Alexa accepts voice command inputs that could be used to control smart lights as output, or a device that detects gestures could be used to point to items on a standard restaurant menu or projected image, or when a Fitbit-style watch detects a high heart rate then soothing music could be played. A version of this project has been used successfully to help children with emotional disabilities.

Over the summer, we want to expand the kinds of hardware available to be combined and provide an intuitive interface that allows non-technical users to combine inputs with outputs. Students should be interested in learning about new technologies and be creative in coming up with demos to show off different kinds of combinations. There are several possible non-technical groups interested in this project: therapists working with people with special needs; science or children's museum exhibit designers; and escape room experience designers. We also welcome student ideas for technologies to include or example uses.

Outcomes: Students will learn about Open Source software, new kinds of devices devices, and developing interesting interactive demos. This project is written in Java and we would be looking to extend it to interact with more devices and create more example scenarios. Additionally, a front end could be written to make using it more user friendly so all new scenarios would not need to be explicitly programmed.

Skills: Students should be capable Java programmers and interested in learning how to develop with new technology.

Lead: Xiaowei Yang

Description: Both the field of AI and the field of computer networks have advanced significantly in recent years. In computer networks we have witnessed a plethora of new programmable devices, new network architectures, and smart devices. In AI, machine learning with deep neural networks has revolutionized many real world applications. This project aims to study how AI can advance the field of networking and vice versa. Students will develop algorithms and implement software to apply AI and machine learning techniques to develop intelligent solutions to problems for programmable networked devices.

Outcomes: Students will learn how to do research in the field of networking, and learn how to apply AI techniques to improve networking, and how networking can help improve AI. In addition to implementation and testing, students will write a research or survey paper about their techniques at the end of the summer.

Skills: Students should have basic knowledge of AI and networking.

Lead: Kartik Nayak

Description: There has been a lot of research studying consensus protocols and blockchains for the last several decades. Currently, all uses of blockchains are in one of two categories. Permissionless blockchains (or colloquially public blockchains) such as Bitcoin rely on the longest-chain protocols to commit transactions. Most cryptocurrencies today use this class of protocols. In contrast, traditional protocols are permissioned, i.e., the parties participating in them are fixed. They rely on "votes" from known parties to commit transactions. Blockchains used by companies, e.g., Hyperledger Fabric (by IBM), CCF (by Microsoft), SBFT (by VMware), HotStuff (by Calibra), banks, etc. rely on these classes of protocols.

Currently, these are typically viewed as two entirely different methods with their own set of strengths and weaknesses. We conjecture that there may be a lot of similarities between the two approaches and want to create a unifying framework for capturing these protocols. In this project, the goal is to design and implement a consensus protocol framework that will enable a simple blockchain solution with high throughput and low latency for a large number of applications.

Outcomes: Students will get hands-on experience developing blockchains and consensus protocols, as well as understanding and testing the tradeoffs between different ones. The final product will be an open-source codebase and (potentially) a research paper.

Skills: Students should have experience with programing. Also, experience with algorithms is important for the way of thinking to understand the consensus protocols. Exposure to blockchains is a plus.

Lead: Kristin Stephens-Martinez

Description: One way to help students learn Python is to give them opportunities to practice reading code and predicting the output. This project focuses on developing a tool that provides students with code-tracing questions, where students are presented with code and asked to provide what Python will print when the code runs. There are multiple directions we are currently considering to continue our development, and which path we take will depend on who joins the team this summer! Directions include (but are not limited to): Feature development for both front end or back end, creating and designing questions for different learning concepts in Python, and analysis of preliminary data to understand how students are using the tool, where students are confused, and how to best help students learn from the tool.

Outcomes: Students will learn about the development of a web-based tool, how to approach and analyze a research question, and how to design course curriculum. By the end of the summer students will have contributed to a codebase for Python code-tracing.

Skills: Students should be proficient in Python 3. Students should also have experience or be willing to learn front-end or back-end web development and database management.

Leads: Jun Yang, Sudeepa Roy, Kristin Stephens-Martinez

Description: As organizations and people increasingly rely on data to draw insights and make decisions, data analysis skills become an essential part of any education. However, learning and debugging relational database queries can be challenging. This project seeks to develop frameworks, techniques, and tools that help novices learn and debug such queries. Specifically, we will develop frameworks for explaining why queries are wrong (with respect to reference queries and test database instances), for tracing the execution of queries over particular database instances, and for suggesting possible fixes. We will devise computational techniques for solving problems defined in these frameworks efficiently, at interactive speed. We will build practical tools for learning and debugging queries, for both relational algebra and full SQL, and evaluate their effectiveness in an educational setting. For more information, check out these research papers/demos on Explaining Wrong Queries Using Small Examples and RATest: Explaining Wrong Relational Queries Using Small Examples, as well as our RATest system, which has been adopted in CompSci 316/516 at Duke and is being considered by other universities.

Outcomes: Students will learn in-depth understanding of database query processing, mathematical logic, and human-computer interaction principles. By the end of the summer, students will have contributed toward a fully functioning system with a web front-end suitable for deployment in a classroom setting and possibly a draft research paper or demonstration proposal.

Skills: Students should have solid programming skills. Experience with relational database systems (e.g., CS 316) is desired. Web front-end/JavaScript experience is a plus.

Lead: Brandon Fain

Description: Much attention has been given recently to gerrymandering: the practice of drawing legislative districts in order to systematically promote or suppress the interest of some electorate. In addition, researchers have become increasingly interested in technical questions surrounding algorithmic fairness and bias in machine learning tasks like classification and clustering, as these systems are increasingly used to make real-world decisions. In this project, students will apply ideas from algorithmic fairness in clustering to study fair and unbiased districting.

Districtings can be unfair in different ways. This project draws inspiration from the Voting Rights Act of 1965, which is interpreted by the supreme court as implying (roughly) that racial and language minorities have had votes diluted unfairly and illegally if they constitute a majority in a plausible district where their votes would result in a different winning party than in the districts to which they currently belong. In this project, we will explore algorithms to detect and correct for such violations in districting. Students will pull data from recent elections in the U.S. and develop and implement algorithms to efficiently search for these violations of fairness in districting. Students will also design algorithms for computing districtings that are fair with respect to such violations.

Outcomes: Students will learn how to use large geometric datasets and design algorithms for hard search problems and clustering/districting on such data. The final product will be an open source implementation of the algorithms, along with a report detailing our findings.

Skills: Students should have some programming and mathematical background (at the level of CS 201, 230, or equivalents). Experience with or a desire to learn about algorithms and data science are a plus.

Lead: Mary Cummings

Description: US Navy aircraft carriers are very dangerous places with significant opportunities for injuries and fatalities. This project will take a multi-year large data set of accidents across several aircraft carriers and apply various machine learning models to determine if any critical relationships can be seen in the data.

Outcomes: Students will learn how to apply machine learning models to real data with significant safety implications. By the end of the summer, students will create a paper detailing the results of the study.

Skills: Students should have some experience with machine learning.

Leads: Alberto Bartesaghi, Cynthia Rudin

Description. Cryogenic electron microscopes -or cryo-EM for short- allow researchers to peer at the microscopic shape of proteins like never before. These machines blast proteins with a 300,000-volt beam of electrons so that highly sensitive detectors underneath can tease out their shapes based on the interaction that occurs. Being able to “see” proteins -life’s crucial building materials- can help determine how they function. Recognizing protein structure and function is essential for scientists trying to design better drugs to tackle some the world’s most devastating diseases, including HIV, cancer and Alzheimer’s disease. A 300,000-volt electron beam is, however, extremely damaging to the proteins it is trying to image. To help protect the samples in the machine, researchers cryogenically freeze them to help maintain their integrity and use very low electron doses to prevent structural damage. This allows researchers to obtain images of intact proteins and biomolecular structures that were previously inaccessible to other technologies.

Modern electron detectors can record rapid bursts of frames (up to 1,500 frames per second), allowing the capture of individual electron events during the exposure that result in extremely low signal-to-noise ratio images -much like those obtained in low-light photography applications. The naturally occurring drift of the biological sample during the exposure -caused by beam-induced motion- is known to negatively impact image resolution when a simple average of the frames is calculated. Motivated by recent advances in deep neural network approaches for natural image super-resolution and burst photography techniques that harness natural hand tremor on smartphone cameras, we seek to apply these methods to improve the resolution of cryo-EM images of proteins. Improving resolution will allow the visualization of the 3D shape of these molecular machines at unprecedented levels of detail, providing new clues to uncover their mechanism of action.

Outcomes: After completing this project, participants will have acquired experience applying modern machine learning and image processing techniques to an alluring and challenging research area in computational structural biology. Students will write computer code that will read low signal-to-noise ratio bursts of 50-100 frames and produce super-resolution images that will later be used for 3D reconstruction. Depending on how much progress is made, students may also write a research paper to describe their approach and present results obtained using real cryo-EM images.

Skills: Students should have some experience with machine learning and computer vision. Data science project experience in super-resolution (e.g., in CS 290 Data Science Competition) is a plus.


Explore prior years' CS Summer Research Projects