Sounderfeit: cloning a physical model using a conditional adversarial autoencoder

Stephen Sinclair

doi:10.5216/mh.v18i1.53570

Autores/as

Stephen Sinclair Inria Chile, Santiago, Chile http://orcid.org/0000-0001-7869-3503

DOI:

https://doi.org/10.5216/mh.v18i1.53570

Palabras clave:

Physical modeling, Sound synthesi, Auto encoder, Latent parameter space

Resumen

An adversarial autoencoder conditioned on known parameters of a physical modeling bowed string syn- thesizer is evaluated for use in parameter estimation and resynthesis tasks. Latent dimensions are provided to cap- ture variance not explained by the conditional parameters. Results are compared with and without the adversarial training, and a system capable of “copying” a given parameter-signal bidirectional relationship is examined. A real- -time synthesis system built on a generative, conditioned and regularized neural network is presented, allowing to construct engaging sound synthesizers based purely on recorded data.

Descargas

Los datos de descargas todavía no están disponibles.

Biografía del autor/a

Stephen Sinclair, Inria Chile, Santiago, Chile

Stephen Sinclair (Inria Chile, Santiago, Chile) stephen.sinclair@inria.cl

Citas

ABADI, M; AGARWAL A; BARHAM P; BREVDO E; CHEN Z; CITRO C; CORRADO G; DAVIS A; DEAN J; DEVIN M; GHEMAWAT S; GOODFELLOW I; HARP A; IRVING G; ISARD M; JOZE- FOWICZ R; JIA Y; KAISER L; KUDLUR M; LEVENBERG J; MANÉ D; SCHUSTER M; MONGA R; MOORE S; MURRAY D; OLAH C; SHLENS J; STEINER B; SUTSKEVER I; TALWAR K; TU-

CKER P; VANHOUCKE V; VASUDEVAN V; VIÉGAS F; VINYALS O; WARDEN P; WATTEN- BERG M; WICKE M; YU Y; ZHENG X. TensorFlow: Large-Scale Machine Learning on Hetero- geneous Systems. 2015. Available: http://tensorflow.org. Accessed: 2017.

CEMGIL, A; ERKUT, C, Calibration of physical models using artificial neural networks with application to plucked string instruments. Proceedings of the International Symposium on Mu- sical Acoustics, St-Alban, UK, 1997. v. 19, p. 213–218.

COOK, P; SCAVONE, G, The Synthesis ToolKit (STK). Proceedings of the International Compu- ter Music Conference, Beijing, China, 1999.

ENGEL, J; RESNICK, C; ROBERTS, A; DIELEMAN, S; ECK, D; SIMONYAN, K; NOROUZI, M, Neural audio synthesis of musical notes with WaveNet autoencoders. preprint arXiv:1704.01279, 2017.

GABRIELLI, L; TOMASSETTI, S; SQUARTINI, S; ZINATO, C, Introducing deep machine lear- ning for parameter estimation in physical modelling. Proceedings of the International Conferen- ce on Digital Audio Effects (DAFx-17), Edinburgh, UK, 2017.

KINGMA, D.; BA, J, Adam: A method for stochastic optimization. International Conference on Learning Representations, San Diego, 2015.

MAKHZANI, A; SHLENS, J; JAITLY, N; GOODFELLOW, I, Adversarial autoencoders. Procee- dings of the International Conference on Learning Representations, San Juan, Puerto Rico, 2016.

MEHRI, S; KUMAR, K; GULRAJANI, I; KUMAR, R; JAIN S; SOTELO, J; COURVILLE, A; BEN- GIO, Y, SampleRNN: An unconditional end-to-end neural audio generation model. Internatio- nal Conference on Learning Representations, Toulon, France, 2017.

MIRZA, M; OSINDERO, S, Conditional Generative Adversarial Nets. preprint arXiv:1411.1784, 2014.

OORD, A; DIELEMAN, S; ZEN, H; SIMONYAN, K; VINYALS, O; GRAVES, A; KALCHBREN- NER, N; SENIOR, A; KAVUKCUOGLU, K, WaveNet: A generative model for raw audio. preprint arXiv:1609.03499, 2016.

PFALZ, A.; BERDAHL, E. Toward inverse control of physics-based sound synthesis. Procee- dings of the First International Conference on Deep Learning and Music, Anchorage, USA, 2017.

RADFORD, A.; METZ, L.; CHINTALA, S. Unsupervised representation learning with deep con- volutional generative adversarial networks. Proceedings of the International Conference on Lear- ning Representations, San Juan, Puerto Rico, 2016.

RIERA, P. E.; EGUÍA, M. C.; ZABALJÁUREGUI, M. Timbre spaces with sparse autoencoders. Proceedings of the Brazilian Symposium on Computer Music, Sao Paulo, Brazil, 2017. p. 93–98.

RIIONHEIMO, J.; VÄLIMÄKI, V. Parameter estimation of a plucked string synthesis model using a genetic algorithm with perceptual fitness calculation. EURASIP Journal on Advances in Signal Processing, Springer, v. 2003, n. 8, p. 758284, 2003.

SCHERRER, B.; DEPALLE, P. A physically-informed audio analysis framework for the identifi- cation of plucking gestures on the classical guitar. Canadian Acoustics, v. 39, n. 3, p. 132–133, 2011.

SHEN J; PANG R; WEISS R; SCHUSTER M; JAITLY N; YANG Z; CHEN Z; ZHANG Y; WANG Y; SKERRY-RYAN, RJ, SAUROUS R; AGIOMYRGIANNAKIS Y; WU Y. Natural TTS synthesis by conditioning WaveNet on mel spectrogram predictions. preprint arXiv:1712.05884, 2017.

SINCLAIR, S. Sounderfeit: Cloning a Physical Model with Conditional Adversarial Autoenco- ders. Proceedings of the Brazilian Conference on Computer Music, Sao Paulo, Brazil, 2017.

SMITH, J; SERRA, X, PARSHL: An analysis/synthesis program for non-harmonic sounds ba- sed on a sinusoidal representation. Proceedings of the International Computer Music Conferen- ce, Tokyo, Japan, 1987.