Sounderfeit: cloning a physical model using a conditional adversarial autoencoder

Stephen Sinclair

doi:10.5216/mh.v18i1.53570

Authors

Stephen Sinclair Inria Chile, Santiago, Chile http://orcid.org/0000-0001-7869-3503

DOI:

https://doi.org/10.5216/mh.v18i1.53570

Keywords:

Physical modeling, Sound synthesi, Auto encoder, Latent parameter space

Abstract

An adversarial autoencoder conditioned on known parameters of a physical modeling bowed string syn- thesizer is evaluated for use in parameter estimation and resynthesis tasks. Latent dimensions are provided to cap- ture variance not explained by the conditional parameters. Results are compared with and without the adversarial training, and a system capable of “copying” a given parameter-signal bidirectional relationship is examined. A real- -time synthesis system built on a generative, conditioned and regularized neural network is presented, allowing to construct engaging sound synthesizers based purely on recorded data.

Downloads

Download data is not yet available.

Author Biography

Stephen Sinclair, Inria Chile, Santiago, Chile

Stephen Sinclair (Inria Chile, Santiago, Chile) stephen.sinclair@inria.cl

References

ABADI, M; AGARWAL A; BARHAM P; BREVDO E; CHEN Z; CITRO C; CORRADO G; DAVIS A; DEAN J; DEVIN M; GHEMAWAT S; GOODFELLOW I; HARP A; IRVING G; ISARD M; JOZE- FOWICZ R; JIA Y; KAISER L; KUDLUR M; LEVENBERG J; MANÉ D; SCHUSTER M; MONGA R; MOORE S; MURRAY D; OLAH C; SHLENS J; STEINER B; SUTSKEVER I; TALWAR K; TU-

CKER P; VANHOUCKE V; VASUDEVAN V; VIÉGAS F; VINYALS O; WARDEN P; WATTEN- BERG M; WICKE M; YU Y; ZHENG X. TensorFlow: Large-Scale Machine Learning on Hetero- geneous Systems. 2015. Available: http://tensorflow.org. Accessed: 2017.

CEMGIL, A; ERKUT, C, Calibration of physical models using artificial neural networks with application to plucked string instruments. Proceedings of the International Symposium on Mu- sical Acoustics, St-Alban, UK, 1997. v. 19, p. 213–218.

COOK, P; SCAVONE, G, The Synthesis ToolKit (STK). Proceedings of the International Compu- ter Music Conference, Beijing, China, 1999.

ENGEL, J; RESNICK, C; ROBERTS, A; DIELEMAN, S; ECK, D; SIMONYAN, K; NOROUZI, M, Neural audio synthesis of musical notes with WaveNet autoencoders. preprint arXiv:1704.01279, 2017.

GABRIELLI, L; TOMASSETTI, S; SQUARTINI, S; ZINATO, C, Introducing deep machine lear- ning for parameter estimation in physical modelling. Proceedings of the International Conferen- ce on Digital Audio Effects (DAFx-17), Edinburgh, UK, 2017.

KINGMA, D.; BA, J, Adam: A method for stochastic optimization. International Conference on Learning Representations, San Diego, 2015.

MAKHZANI, A; SHLENS, J; JAITLY, N; GOODFELLOW, I, Adversarial autoencoders. Procee- dings of the International Conference on Learning Representations, San Juan, Puerto Rico, 2016.

MEHRI, S; KUMAR, K; GULRAJANI, I; KUMAR, R; JAIN S; SOTELO, J; COURVILLE, A; BEN- GIO, Y, SampleRNN: An unconditional end-to-end neural audio generation model. Internatio- nal Conference on Learning Representations, Toulon, France, 2017.

MIRZA, M; OSINDERO, S, Conditional Generative Adversarial Nets. preprint arXiv:1411.1784, 2014.

OORD, A; DIELEMAN, S; ZEN, H; SIMONYAN, K; VINYALS, O; GRAVES, A; KALCHBREN- NER, N; SENIOR, A; KAVUKCUOGLU, K, WaveNet: A generative model for raw audio. preprint arXiv:1609.03499, 2016.

PFALZ, A.; BERDAHL, E. Toward inverse control of physics-based sound synthesis. Procee- dings of the First International Conference on Deep Learning and Music, Anchorage, USA, 2017.

RADFORD, A.; METZ, L.; CHINTALA, S. Unsupervised representation learning with deep con- volutional generative adversarial networks. Proceedings of the International Conference on Lear- ning Representations, San Juan, Puerto Rico, 2016.

RIERA, P. E.; EGUÍA, M. C.; ZABALJÁUREGUI, M. Timbre spaces with sparse autoencoders. Proceedings of the Brazilian Symposium on Computer Music, Sao Paulo, Brazil, 2017. p. 93–98.

RIIONHEIMO, J.; VÄLIMÄKI, V. Parameter estimation of a plucked string synthesis model using a genetic algorithm with perceptual fitness calculation. EURASIP Journal on Advances in Signal Processing, Springer, v. 2003, n. 8, p. 758284, 2003.

SCHERRER, B.; DEPALLE, P. A physically-informed audio analysis framework for the identifi- cation of plucking gestures on the classical guitar. Canadian Acoustics, v. 39, n. 3, p. 132–133, 2011.

SHEN J; PANG R; WEISS R; SCHUSTER M; JAITLY N; YANG Z; CHEN Z; ZHANG Y; WANG Y; SKERRY-RYAN, RJ, SAUROUS R; AGIOMYRGIANNAKIS Y; WU Y. Natural TTS synthesis by conditioning WaveNet on mel spectrogram predictions. preprint arXiv:1712.05884, 2017.

SINCLAIR, S. Sounderfeit: Cloning a Physical Model with Conditional Adversarial Autoenco- ders. Proceedings of the Brazilian Conference on Computer Music, Sao Paulo, Brazil, 2017.

SMITH, J; SERRA, X, PARSHL: An analysis/synthesis program for non-harmonic sounds ba- sed on a sinusoidal representation. Proceedings of the International Computer Music Conferen- ce, Tokyo, Japan, 1987.