Recherche Google
IntroductionIntroduction > AudioSculpt Presentation
page précédentepage suivante

AudioSculpt Presentation

Audiosculpt literally allows to sculpt, or elaborate sounds visually.The application is dedicated to sound visualization, analysis and manipulation.

Sound Representation and Manipulation

Graphic Representation

AudioSculpt offers three types of sound – or signal[1] – representations : amplitude / time waveform, spectrum, and sonogram. These graphic representations allow to choose relevant control parameters for sound treatment and analysis, and constitute an interface to perform sound manipulations.

Treatments and Analysis

Treatments and analysis methods are based on two kernels[2] that bundle the algorithms[3] necessary for Audiosculpt :

  • SuperVP : the Super Phase Vocoder[4] tool and audio analysis library
  • Pm2 : Partial Manager 2, a sinusoidal modeling[5] tool.

The user can manipulate sounds via the control of these kernels, either graphically using the dialogs provided by AudioSculpt, or via a command line interface integrated in AudioSculpt. Treatments are represented and organized symbolically in a processings sequencer, where they can be moved, enabled or disabled individually.

Graphic Manipulation of Sound

The interface takes the concepts of graphic design software. This allows to define time-frequency regions within the sound representation, which can be transformed and manipulated. The frequential visualization of sound offers a wide range of possibilities for advanced processing in time domain.

Main Functionalities


The Audiosculpt interface has four windows :

  • the spectrum and sonagram are synchronized representations for the visualization of sounds, from a macroscopic scale – overall sequence – to a microscopic scale – sample level–.
  • the frequency / amplitude spectral envelope represents the amplitude of the sound components at a given point in time.
  • the processing sequencer

Several analysis methods are available in Audiosculpt :

  • [6] STFT[7] – Short Term Fourier Transform: a sound is described as sums of sinusoids, with a given frequency, amplitude and phase, step by step, as it changes over time.
  • LPC – Linear Predictive Coding : an audio signal and speech processing tool for estimating the spectral envelope of a digital signal based on linear prediction[8]. LPC estimates are optimal for noise signals but are systematically biased for voiced or harmonic sounds.
  • True-Envelope : a technique for the estimation of the spectral envelope, allowing real time processing and efficient estimations even for problematic, high pitch signals. This technique is also applied for pitch modifications by the phase vocoder to preserve the spectral envelope, and consequently, the qualities of the timbre, even with respect to signals containing sinusoidal and noise components.
  • Fundamental frequency[9] : this technique allows to detect the perceived pitch of a sound, in most cases, and is also efficient with temporal pitch variations. It also applies to the partials processing for periodic[10] or quasi periodic sounds.
  • Partial tracking : in a time-evolving sound with a varying frequency and amplitude, partials are groupped coherently, according to the variations of their amplitude.
  • Automatic segmentation methods : these techniques delimit temporal zones in the sound by the means of markers. Markers can point transients or spectral variations between FFT frames. Events can also be filtered according to their significance and editing by the user.
  • Formants[11] : zones of the spectrum containing partials with more energy are detected. This technique mostly applies to speech analysis : some specific frequencies are reinforced by the resonators of the phonatory organs, which results into the perception of vowels.
  • A pitch fork and harmonic tools allowing to measure and listen to the signal components individually
Sound Processing
  • Filtering : various tools allow to select individual components, or whole time-frequency regions within a sonagram, and manipulate their amplitude, or virtually delete them from the spectrum. The various sound sources of a single sequence also be separated as in an unmixing operation.
  • Compression/expansion: the duration of sounds can be manipulated while preserving their pitch and timbral caracteristics, including attacks and transients.
  • Transposition: reciprocally, pitches can be modified without affecting the sound duration or quality. For the same reason, spectral components and spectral envelope can conversely be transposed separately, to create timbral modifications.
  • Denoising: spectral subtraction with interpolation of sound estimations, without affecting the signal quality. An average signal spectrum and average noise spectrum are estimated and subtracted from each other, so that average signal-to-noise ratio is improved.
  • Cross Synthesis: the waveform of a signal is applied to the spectral components of another signal, in order to create a hybrid sound, or a transition from one sound to another.
  • Partial Synthesis: a new sound is created from the data of a preexisting partial analysis.

These treatments can be assigned to individual tracks of the processings sequencer, and pre-listened to in real-time prior to the final processing..

  1. Signal (cf. Sound)

    In this document, "signal" implicitely means "sound".

    Signals can refer to sound, motion, image or video... In electrical engineering, a signal is a time-varying or spatial-varying quantity. This quantity can be a physical quantity, a set of human information or machine data. It must part of a system. Physical, as well as human and machine signals being seen as simple measurable quantities, the study, the design and implementation of signals as well as systems allowing the transmission, storage, and manipulation of information, is facilitated.

    In sound domain, signals are analog, or digital. An analog signal is a continuous-time (CT) signal. A digital signal is a discrete-time (DT) signal.

    The sound of a guitar playing on an amplifier is continuous, because the voltage fluxuates continually. It cannot be quantized. A digital sound is produced by a discrete signal, that is, a temporal sequence of quantities. It is quantized. Digital signals can be obtained by sampling from analog signals. For instance, a microphone converts an acoustic pressure into a voltage signal. Audiosculpt deals with digital signals that associate an acoustic pressure level (dB) to a given point in time, and three space coordinates. Sound signals are sampled n times per second – in most cases, 44100 times–. Each sample contains data for one or more channels, as in mono, stereo or multichannel recording.

  2. Kernel

    In computing, the kernel is the central component of most computer operating systems; it is a bridge between applications and the actual data processing done at the hardware level.

  3. Algorithm

    An algorithm is an effective method for solving a problem expressed as a finite sequence of steps. Each algorithm is a list of well-defined instructions for completing a task. Starting from an initial state, the instructions describe a computation that proceeds through a well-defined series of successive states, eventually terminating in a final ending state.

  4. Super Vocodeur de Phase / Super Phase Vocoder

    The vocoder is an analysis/synthesis system that was orginally ment to code speech for transmission. Sounds are first input to an encoder, a device that converts information from one format to another : it is passed through a multiband filter with an envelope follower, that measures the amplitudes within a given frequency range. The resulting signals are communicated to the decoder, and applied to corresponding filters in a synthesizer. Vocoders both exist as hardware and software, an are also famous as electronic musical instruments.

    There exist several vocoder systems. Among them, the LPC, which uses linear predictive coding is part of the methods used by Audiosculpt.

    Phase vocoders apply the Short Term Fourier Transform instead of band filters to analyze sounds. As an implementation of the phase vocoder, the Super Phase Vocoder developed by Ircam gathers several other analysis methods. Audiosculpt is considered to be the graphic interface of the SPV.

  5. Sinusoidal Modeling
  6. FFT
  7. STFT
  8. Linear Predictive Model
  9. Fundamental Frequency
  10. Periodicity

    Periodicity is the fact, for a phenomenon, of reproducing itself identically to itself over a given laps of time. Each reproduction of the phenomenon is a cycle. With a sound wave, this phenomenon is an oscillation. The amplitude of the wave goes from a rest position to a peak of compression, to the rest position again, to a peak of rarefaction, and to a rest position.

  11. Formant

    Formants were originally defined as spectral peaks in a sound spectrum. Resonance and formant are conceptually distinct, but some writers about the voice use the terms interchangeably. Second, the acoustics of the vocal tract are often modelled using a mathematical model of a filter, where the frequencies of the poles of this model fall close to those of the formants. As a result, some voice researchers now refer to the frequencies of the poles as formants.

    Hence, it can be : a peak in the spectrum, a resonance of the vocal tract, or a pole in a mathematical filter model.

    In acoustics a formant is originally a broad peak in the spectral envelope of the sound. The singers formant and actors formant are broad peaks in the spectral envelope occurring around 3 kHz. In vocal sounds, formants result into vowels.

page précédentepage suivante
A propos...IRCAMRéalisé avec Scenari