Example Projects from Fall 2019

Automatic Page-Turner for Musicians
Vanessa Yan, Qinying Sun & Sally Ma

šŸ† Class Choice Award!Ā 

In this final project for Yale CPSC 459/559, we build an interactive end-to-end system that automatically turns the page for musicians in real-time. The user is able to upload multiple pages of musicscores to a web app, after which the app parses the notes in themusic scores and detects the pitches and duration of each note atthe end of each music page. At the click of a button, the system begins to listen to the musician play, matching the visual and audio information to determine whether the musicians has arrived at the end of a page, at which point the web app displays the next page of the music score for the user.

Report



LLAMA: Learning Latent trAnsforMations for generative style trAnsfer
William Hu, Sydney Thompson & Nathan Tsoi

šŸ… Class Choice Honorable MentionĀ 

We present a method and a tool, called StyleApp, for smooth interpolation between images of different styles. Our technique gives the user control over the visual properties of style and works even when only one sample of a given style is provided. We also explore different architectures and techniques to facilitate realistic generation of handwriting styles. In particular, we first train two types of variational autoencoders on EMNIST to learn character representations and then fine-tune the models on samples of our own handwriting to create person-specific networks for style. After creating our individualized style networks, we investigate latent space clustering and linear transformations as potential methods for extracting semantic meaning from our learned representations. Though our application currently uses labeled data, we show that unsupervised methods of learning semantics from the compressed representation of images is possible and hope that our findings will enable future work.

Report



Style Transformation on Human Faces using Feedforward and Generative Methods
Annie Gao, Valerie Chen & Yichao Cheng

Human faces are varied in nature and in styles, but often structurally similar. Recent approaches using feedforward and generative meth- ods have demonstrated photo-realistic capabilities for artistic style transfers, which we believe hold potential for human facial feature transformation. We investigate the FastCNN, Cycle GAN, and Style GAN architectures to transform an input image to an ā€˜agedā€™ version of the image. We made various modifications including changing the loss network and loss weights, training data set, and adding a mapping network respectively to each architecture to handle the transformation of a new input image. The results for each architecture were varied. We present a qualitative analysis of each methodā€™s output transformation and example transformed images. Finally, we highlight some benefits and drawbacks of each method.

Report



Lights! Camera! (Optimal) Action! Learning a Lighting Policy for Robot Photography
Joe Connolly, Kayleigh Bishop & Simon Mendelsohn

Manually ensuring proper lighting when capturing portrait photos can be a time-consuming and tedious task, as well as difficult for novice users. In order to ease the burden of photographers, and also enable automated systems to take high quality photos, we designed a routine that automatically adjusts lighting to optimally prepare a robot photography system for capturing portrait photographs. We combine multiple different existing image quality metrics and use those metrics as a reward for a neural network system that learns how to optimize lighting. In our arrangement, the neural network system is connected to two Hue light bulbs whose brightness it can adjust however it sees fit. We achieve promising results and demonstrate the feasibility and practicality of such a robot photography system. We also highlight exciting avenues for future research.

Report Source Code



Climb a-GAN: Generation of Rock Climbing Problems
Cove Geary & Joseph Valdez

Many rock climbing gyms have adopted a new type of rock climbing wall: the MoonBoard. The MoonBoard is a standardized interactive training wall of 11 x 18 holds that are identical in each gym. The MoonBoard website provides a space for users to create new custom paths, upload them, and even rate other paths. Using an ACGAN, it is possible for a user to be able to input a difficulty for a desired rock climbing path and have it generate a new suggested path according to that difficulty that is compatible with the MoonBoard. Current GitHub projects for MoonBoards generate a random path and then classify the difficulty, so this would be approaching it in reverse. By generating a path based on a given difficulty, it will be easier for rock climbing users to find a path they would be able to attempt (and enjoy). The classifiers in the ACGAN are the different levels of difficulties of the rock climbing routes. The ACGAN will output an image of a MoonBoard with the starting holds circled in green, the intermediate holds circled in blue, and the ending holds circled in red.

Report Source Code



Man vs Machine: Discerning Between In-Person and Electronic Audio
Emmanuel Adeniran, Nicholas Georgiou & Debasmita Ghose

Determining whether human voices are being played through electroacoustic transducers, like a televisionā€™s or computerā€™s loudspeakers, or are being originated from a person without replay, is an important distinction for voice-activated devices to make. Ideally, these devices should be able to discern whether audio is originating from people who are contemporaneous and collocated with the robot, versus audio that is originating from an electronic device. The ability of a robot to discern these ā€œhere and nowā€ scenarios is critical to enabling a robot to understand circumstances surround- ing human social interactions. This body of work addresses this conundrum by implementing a method to differentiate attributes that are characteristically and consistently fundamental to voices replayed through loudspeakers (electronic) versus originated by a person (in-person). Therefore, we attempt to build a model that can distinguish between attributes in natural and electronic audio, recorded on a given microphone. (ā€¦)

Report