Improving Source Extraction with Diffusion and Consistency Models

Welcome to the demo page for "Improving Source Extraction with Diffusion and Consistency Models" paper.

In this work, we investigate the integration of a score-matching diffusion model into a standard U-Net architecture for time-domain musical source separation. Further, since diffusion models typically suffer from iterative and thus slow sampling processes, we employ consistency distillation to accelerate the sampling speed, bringing it closer to that of a Deterministic model with no loss in quality. Our model, trained on the Slakh2100 dataset targeting four instruments (bass, drums, guitar, and piano), demonstrates significant improvements across objective metrics compared to the baseline methods.

On this page, we present the separetion demos of 5 different scenarios: Deterministic model, Diffusion Model, and CD with 1, 2 and 4 denoining steps.

Original


Track Mix Bass Drums Guitar Piano

Deterministic model


Track Mix Bass Drums Guitar Piano

Diffusion


Track Mix Bass Drums Guitar Piano

CD_1_step


Track Mix Bass Drums Guitar Piano

CD_2_step


Track Mix Bass Drums Guitar Piano

CD_4_step


Track Mix Bass Drums Guitar Piano