Improving Source Extraction with Diffusion and Consistency Models
Welcome to the demo page for "Improving Source Extraction with Diffusion and Consistency Models" paper.
In this work, we investigate the integration of a score-matching diffusion model into a standard U-Net architecture for time-domain musical source separation. Further, since diffusion models typically suffer from iterative and thus slow sampling processes, we employ consistency distillation to accelerate the sampling speed, bringing it closer to that of a Deterministic model with no loss in quality. Our model, trained on the Slakh2100 dataset targeting four instruments (bass, drums, guitar, and piano), demonstrates significant improvements across objective metrics compared to the baseline methods.
On this page, we present the separetion demos of 5 different scenarios: Deterministic model, Diffusion Model, and CD with 1, 2 and 4 denoining steps.