Project descriptions and video presentations are available below. Or, view all presentation videos: CS+ Undergraduate Summer 2020 Research Projects YouTube Playlist.
Description: PoeTiX is working to generate automated poetry with natural and artistic language. Through natural language processing tools including Gpt-2 and BERT, as well as coded rules to respect the rules of the English language, the team has increased its ability to automate English sonnets that approach a human author's authenticity. With dictionaries built around the diction choices of famous poets, PoeTiX can produce poems with the structure and sounds of certain poets. We are working to find creative solutions to the issue of imparting greater lyrical and narrative meaning into these works, and in doing so explore ethical questions of the boundaries between art, automation, and human intervention.
Lead: Brandon Fain
Description: In the American electoral system, representatives are elected to individual legislative districts and represent their constituents at various levels of government. The act of determining and redrawing the boundaries for legislative districts is known as redistricting. Gerrymandering is the practice of drawing legislative districts to systematically promote or suppress the interest of some electorate. Districtings can be considered unfair in different ways. This project draws inspiration from the Voting Rights Act of 1965, which is interpreted by the supreme court as implying (roughly) that racial and language minorities have had votes diluted unfairly and illegally if they constitute a majority in a plausible district where their votes would result in a different winning party than in the districts to which they currently belong. We focus primarily on identifying two commonly accepted forms of unfairness in a map: cracking and packing. Cracked voters have been separated to dilute the power of their collective votes, while packed voters have been clustered together to prevent them from influencing the outcomes of more seats. Researchers have become increasingly interested in technical questions surrounding algorithmic fairness and bias in machine learning tasks like classification and clustering, and these systems are increasingly used to make real-world decisions. In this project, we attempt to apply ideas from algorithmic fairness in clustering to study fair and unbiased districting. We pulled data from recent elections in North Carolina and developed algorithms to identify instances of cracking and packing on both the district and state level. On a state level, we developed a metric to find an approximate number of cracked and packed voters for a given map. This metric compares the map we are interested in against a distribution of comparator maps to identify cracked and packed voters. We have also developed an algorithm that can take a given map and alter it to produce a map optimized to perform well on this gerrymandering metric.
Description: The Lisp family of programming languages has long been used in academia, both for use in research and as a teaching tool. Higher order procedures, poly- morphism, and functional abstraction yield elegant and accurate programs. In this project, we explore some roles Scheme, a dialect of Lisp, may have in protein design applications. To do so, we develop a suitable Scheme interface for OSPREY 3.0, our laboratory's protein design software package.
Lead: Xiaowei Yang
Description: Both the field of AI and the field of computer networks have advanced significantly in recent years. In computer networks we have witnessed a plethora of new programmable devices, new network architectures, and smart devices. In AI, machine learning with deep neural networks has revolutionized many real world applications. This project aims to study how AI can advance the field of networking and vice versa. Students will develop algorithms and implement software to apply AI and machine learning techniques to develop intelligent solutions to problems for programmable networked devices.
Lead: Rong Ge
Description: Despite the empirical success of deep neural networks, theoretical analyses of their performance are still insufficient. This project investigates the loss landscape of local minima with different generalization performances. We also provide both theoretical and empirical analyses of Hessians at minima. Our findings will lead to a deeper understanding of the optimization process and generalization, which will hopefully provide insights to designing better deep learning algorithms.
Lead: Kristin Stephens-Martinez
Description: What Will Python Do (or WWPD) is a web application that provides an online quiz tool to students enrolled in CS101 at Duke University. CS101 focuses on a variety of core concepts related to the Python programming language. Therefore, we have created a large SQL database of randomly generated Python questions that reflect the fundamental concepts learned in CS101. Students enrolled in the class can log in to WWPD and choose from an array of concepts to create a quiz based on what they want to review. After taking the quiz, our site provides immediate feedback as to how the student performed on the quiz, specifically highlighting any topics that they may be struggling with. Furthermore, the students can visit their dashboard to see a visualized overview of their mastery on all topics listed on our site. Lastly, the instructor(s) have their own dashboard where they can monitor usage on the site and see how well students are understanding different topics.
Lead: Robert Duvall
Description: MyTale is a simple, easy-to-use interface where users can explore, personalize, and create social stories, a tool developed by Carol Gray to help austistic children improve communication and self-care skills. From the MyTale search page, users can choose a social story that fits their needs with the help of tags and filters geared toward specific audiences and then personalize that story. Users can also upload stories that they have created or edited and want others to see. Our Story Editor gives the user the tools to easily personalize stories to their requirements, ranging from the color scheme to the content in the story. If the user cannot find a story of their liking, they can simply create their own. MyTale reinvents the way social stories are delivered, providing a comfortable and streamlined user experience.
Leads: Alberto Bartesaghi, Cynthia Rudin
Description. Cryogenic electron microscopy (cryo-EM) allows researchers to peer at the microscopic shape of proteins like never before. Being able to see proteins can help determine how they function. Recognizing protein structure and function is essential for scientists trying to design better drugs to tackle some of the world's most devastating diseases, including HIV, cancer, and, most recently, SARS-CoV-2. Modern electron detectors can record rapid bursts of frames (up to 1,500 frames per second). The challenge of this approach comes from the extremely low signal-to-noise ratio images -- much like those obtained in low-light photography applications. Additionally, the naturally occurring drift of the biological sample during the exposure is known to negatively impact image resolution when a simple average of the frames is used. Motivated by recent advances in deep neural network approaches for natural image super-resolution and burst photography techniques that harness natural hand tremor on smartphone cameras, we seek to apply these methods to improve the resolution of cryo-EM images of proteins. Our approach has the following steps: (1) we created a regression model to score various frame alignment strategies using multiple neural network architectures such as ResNet, (2) we optimized the alignment of frames on a finer grid using the regression model as a guide with the goal of achieving the highest possible score, and (3) we benchmarked this strategy against the baseline solution which used cross-correlation as the scoring metric. Our results indicate that super-resolution techniques can be applied to cryo-EM images and have the potential to overcome the limitations imposed by the imaging system and improve the resolution of protein structures.
Lead: Joe Shamblin
Description: The goal of this project is to end up with a working device that will use a low-powered, single-board computer (SBC) with a camera to classify and record species of birds in a natural setting to a central repository (a database, for instance). Ideally, it would be possible to hook the device up to a battery and solar recharger and deploy it in the field. In order to achieve this goal, multiple object detection models will be researched and tested for their viability to run on a smaller device, in this case, the NVIDIA Jetson Nano. Specifically, model accuracy and optimizations will be explored and tested with the camera on the device with the hopes of achieving a frame rate and accuracy that allows for sufficient data to classify species as they come in view. This would be an excellent source of information for biologists tracking species migration patterns. Additionally, the development process and final working device can be applicable to other fields involving the use of cameras and object detection.
Lead: Kartik Nayak
Description: Consensus protocols serve a central role to the development of blockchain. In this project, we studied and looked into different protocols, namely Sync-HotStuff and Byzantine Fault Tolerant(BFT), in how they address the challenges faced by blockchain system such as decentralization, crash failures and byzantine faults. We worked on an existing open source codebase (Concord-BFT from VMware) and implemented Sync-HotStuff protocol in steady state. In addition, we enabled the logging to robustly monitor and to ensure the correctness of performance. In the future, we hope to continue the implementation and realize the view change protocol in Sync-HotStuff. We also hope to provide an evaluation of the two consensus protocols on a series of Apollo testing.
Leads: Jun Yang, Sudeepa Roy, Kristin Stephens-Martinez
Description: Our project was to create an interactive debugger for SQL, which is a ubiquitous query language for accessing and modifying data stored in relational databases. In general, SQL queries can be composed of smaller query parts called subqueries, which can be nested inside of each other. For this reason, we wanted to allow users to interactively trace through individual subqueries, including correlated subqueries where the results of the subquery depend on the results of another subquery. To achieve this goal, we have created I-Rex, a system designed to help users understand SQL query evaluation and debug SQL queries. I-Rex offers a novel interface for interactively tracing SQL query evaluation in a way faithful to how queries are written syntactically. We support this even for complex queries involving multiple levels of nesting and correlation. In the case where the user is trying to answer a specific question with SQL, such as in a classroom setting, I-Rex also helps make clear why a query returns an incorrect answer with respect to a correct query over a test database instance. It does this by letting users focus on smaller database instances contained in the large one (which we call counterexamples) that still show how the user's incorrect query differs from the correct query. We plan to deploy I-Rex and assess its effectiveness in Duke's Introduction to Database Systems course (CompSci 316) in the Fall 2020 semester.