The Tools We Use
Version
1.5
May 2005
Since October 2004, I have been surveying the MIR community (via
the music-ir@ircam.fr mailing list) as to what tools MIR
researchers were using. I have compiled this list based
upon the inputs of MIR researchers from around the world.
I'd like to keep this list active and up to date, so if you know of any
tools that you think should be on this list let me know. Thanks
for all the input! -- Paul Lamere: (email, blog) - Sun
Labs.
Machine Learning
- M2K - M2K
represents the music-specific set of D2K modules designed
to create a Virtual Research Lab (VRL) for MIR/MDL development,
prototyping and evaluation. M2K provides the framework for the
MIREX (Music Information
Retrieval Evaluation eXchange) contest, an annual MIR evaluation. D2K,
together with a subsidiary set of modules called T2K
(Text-to-Knowledge), provide the basic foundation upon which M2K is
being developed. D2K/T2K are the result of a ongoing research and
development project of the Automated Learning Group (ALG) at NCSA. M2K License:
BSD-Like
- Weka - Weka is
a collection of machine learning algorithms for data
mining tasks written in the Java programming language. Weka contains
tools for
data pre-processing, classification, regression, clustering,
association rules, and visualization. It is also well-suited for
developing new machine learning schemes. The book: Data Mining
compliments the Weka Software. License:
GNU
General Public License (GPL) .
- Marsyas - Marsyas
is a software framework for rapid prototyping and experimentation with
computer audition applications with specific emphasis on Music Information Retrieval.
Marsyas provides a general, extensible and flexible architecture
that allows easy experimentation with algorithms and provides fast
performance that is useful in developing real time audio analysis
tools. A variety of existing building blocks that form the basis
of most published algorithms in Computer Audition are already available
as part of the package. Marsyas is written in C++ and Java and is
actively being developed by George Tzanetakis. License: GNU
General Public License (GPL)
- Torch
- Torch is a
machine learning library written in C++
that works on most Unix/Linux platforms. It can be
used to train MLPs, RBFs, HMMs, Gaussian Mixtures,
Kmeans, Mixtures of experts, Parzen Windows, KNN,
and can be easily extended so that you can add
your own machine learning algorithms. Torch is currently
developed at IDIAP and is described in the paper
Torch:
a modular machine learning software library Torch3
has been successfully tested on Linux, SunOS, FreeBSD,
OSF1, Mac OS X and even MS Windows. License: Torch3 is
free, distributed under a BSD license.
- NODElib
- Neural
Optimization Development Engine library is a programming library for
rapidly developing powerful
neural network simulations. The code is extremely modular, compact, and
robust.
It is written in an object oriented manner. All of the library
code, example and test program source,w documentation, and
supporting text is only on the order of about 20,000 lines, which
means that NODElib is extremely compact. NODELib is written in C.
License: GNU
General Public License (GPL) .
- SVM
- this package
defines support vector machines (SVMs) for both
classification and regression problems. The SVMs can use a wide
variety of kernel functions. Optimization of the SVMs is
performed by a variation of John Platt's sequential minimal
optimization (SMO) algorithm. This version of SMO is generalized
for regression, uses kernel caching, and incorporates several
heuristics; for these reasons, we refer to the optimization
algorithm as SMORCH.
SMORCH
has been shown to be over an order
magnitude faster than SMO, QP, and decomposition. License: GNU
General Public License (GPL) .
- LAPACK/BLAS (Linux
version available from Intel) for matrix
math - The BLAS (Basic Linear Algebra Subprograms) are high quality
"building block" routines for performing basic vector and matrix
operations. Level 1 BLAS do vector-vector operations, Level 2
BLAS do matrix-vector operations, and Level 3 BLAS do
matrix-matrix operations. Because the BLAS are efficient,
portable, and widely available, they're commonly used in the
development of high quality linear algebra software, LINPACK and LAPACK for example. License: Commercial
License
- EMD
- an implementation of the Earth Movers Distance. The EMD computes the
distance between two distributions, which are represented by
signatures. The signatures are sets of weighted features that capture
the distributions. The features can be of any type and in any number of
dimensions, and are defined by the user. License: unknown.
- BNT
- Bayes Net Toolbox for Matlab - supports many types of
conditional probability distributions, decision and utility
nodes, as well as chance
nodes, static and dynamic BNs, many different inference
algorithms, several methods for parameter learning, regularization and
structure learning. License: GNU
Library GPL
- Auditory
Toolkbox - a collection of tools that implement several
popular
auditory models for MATLAB. This
toolbox will also be useful to speech and auditory engineers who want
to
see how the human auditory system represents sounds. License: unknown
- Netlab toolbox
- consists of a toolbox of Matlab functions and scripts based on
the approach and techniques described in Neural
Networks for Pattern Recognition
by
Christopher M. Bishop, (Oxford University Press, 1995), but also
including more recent developments in the field. There is an an
accompanying text book, Netlab:
Algorithms for Pattern Recognition. License: BSD-Style
- SOM Toolbox
for Matlab - an
implementation of the SOM and its visulaization in
the Matlab 5 computing environment. The Toolbox can be used to
preprocess data, initilize and train SOMs using a range of different
kinds of topologies, visulalize SOMs in various ways and analyze the
properties of the SOMs and the data. With data mining in mind,
the Toolbox and the SOM in general are best suited for the data
understanding phase. License:
GNU
General Public License
- MA Toolbox
for Matlab - Implementing Similarity Measures for Audio -
The
MA Toolbox is a collection of functions for Matlab 6 or higher. It
contains functions to analyze music (audio) and compute
similarities. License: GNU General Public License
Music Processing
- ChucK :
Concurrent, On-the-fly Audio Programming Language. ChucK is an audio
programming language for
real-time synthesis, composition, and performance, which runs on
commodity operating systems. ChucK presents a new time-based concurrent
programming model, which supports
multiple, simultaneous, dynamic control rates, and the ability to add,
remove, and modify code, on-the-fly, while the program is running,
without stopping or restarting. It offers composers, researchers, and
performers a powerful and flexible programming tool for building and
experimenting with complex audio synthesis programs, and real-time
interactive control. License:
GNU
General Public License (GPL)
- CSound - Csound is a
programming language designed and optimized for sound rendering and
signal processing. It provides facilities for composition and
performance over a wide range of platforms License: GNU
Library or Lesser General Public License (LGPL)
- SuperCollider -
SuperCollider is an environment and programming language for real time
audio synthesis for MacOS. You can write programs to generate or
process sound in
real time or non real time. SuperCollider can be controlled by MIDI,
the
mouse, Wacom graphics tablet, and over a network via Open Sound
Control. SuperCollider is mostly like Smalltalk but has a different
syntax. License: Free, but not
open-source due to a MacZoop dependency.
- The MIDI Toolbox
- a compilation of functions for analyzing and visualizing MIDI files
in the Matlab computing
environment. Besides
simple manipulation and filtering functions, the toolbox contains
cognitively inspired analytic techniques that are suitable for
context-dependent musical analysis that deal with such topics as
melodic contour, similarity, key-finding, meter-finding and
segmentation. License: GNU General Public
License
- BeatRoot -
An Interactive Beat Tracking and Visualisation System. BeatRoot is able
to estimate the tempo and the times of musical beats in expressively
performed music. License:
GNU Public License
- The Humdrum
Toolkit - set of general-purpose software tools intended to
assist music researchers in posing and answering research questions.
Humdrum allows researchers to encode, manipulate, and output a wide
variety of musically-pertinent representations. The emphasis is on
posing and answering questions about music. License: Although the Humdrum
Toolkit is free of charge, each copy must be registered in order to
establish legal ownership of the copy.
- Marsyas - Marsyas
is a software framework for rapid prototyping and experimentation with
computer audition applications with specific emphasis on Music Information Retrieval.
Marsyas provides a general, extensible and flexible architecture
that allows easy experimentation with algorithms and provides fast
performance that is useful in developing real time audio analysis
tools. A variety of existing building blocks that form the basis
of most published algorithms in Computer Audition are already available
as part of the package. Marsyas is written in C++ and Java and is
actively being developed by George Tzanetakis. License: GNU
General Public License (GPL)
- JTranscriber
- an interactive automatic transcription system which recognizes
musical notes and converts them into MIDI format, displaying the audio
data as a spectrogram with the MIDI data overlaid in piano roll
notation, and allowing interactive monitoring and correction of the
extracted MIDI data. License: Unavailable
- MusicXML - a
universal translator for common Western musical notation from the 17th
century onwards. It is designed as an interchange format for notation,
analysis, retrieval, and performance applications. License: Royalty Free
- Optical
Music Recognition Systems - Donald Byrd a the school of Music,
Indiana University has an excellent table describing the available set
of OMR systems. License: Various,
see individual programs.
- Finale
- a music notation program. License:
commercial
Sound Libraries
- libsndile
- Libsndfile is a C library for reading and writing files containing
sampled sound (such as MS Windows WAV and the Apple/SGI AIFF format)
through one standard library interface. It is released in source code
format under the Gnu
Lesser General Public License.
- portaudio
- PortAudio is
a free, cross platform, open-source,
audio I/O library. It lets you write simple audio programs in 'C'
that will compile and run on many platforms . License: BSD-style open source
license.
- Improv
- a C++
environment for writing programs that enable musician/computer
interaction using MIDI instruments. Improv programs can be
written in special pre-defined environments, or
they can be written from scratch using just the basic MIDI input and output
classes. License: Non-open-source - can be
used for non-commercial purposes including music composition, music
performance or academic research and education. All other uses of
Improv must be licensed.
- RtAudio
- a set of C++ classes which provide a common API for
realtime
audio input/output across Linux
(native ALSA, JACK, and OSS), Macintosh OS X, SGI, and Windows
(DirectSound and ASIO) operating systems. RtAudio
significantly simplifies the process of interacting with computer audio
hardware. License: BSD-style
open source license.
- stk -
The Synthesis ToolKit in C++ is a set of open source
audio signal processing and algorithmic synthesis classes written in
the C++ programming language. STK was designed to facilitate rapid
development of music synthesis and audio processing software, with an
emphasis on cross-platform functionality, realtime control, ease of
use, and educational example code. The Synthesis ToolKit is extremely
portable (it's mostly platform-independent C and C++ code), and it's
completely user-extensible (all source included, no unusual libraries,
and no hidden drivers). License:
non-standard, open-source license:
This software was designed and created to be made publicly available
for free, primarily for academic purposes, so if you use it, pass it on
with this documentation, and for free. If you make a million dollars
with it, give us some. If you make compositions with it, put us in the
program notes.
- MAD - a
high-quality MPEG audio decoder. It currently supports MPEG-1
and the MPEG-2
extension to lower sampling frequencies, as well as the de facto
MPEG 2.5
format. All three audio layers — Layer I, Layer II, and
Layer III (i.e. MP3) — are fully implemented. License: GNU General
Public License, Version 2,
- lame - LAME is an LGPL
MP3 encoder. The Open source development model allowed to improve its
quality and speed since 1999. It is now an highly evolved MP3 encoder,
with quality and speed able to rival state of the art commercial
encoders. License: GNU
General Public License (GPL), GNU
Library or Lesser General Public License (LGPL)
- JavaSound
- provides low-level support for audio
operations such as audio playback and capture (recording), mixing, MIDI
sequencing, and MIDI synthesis in an extensible, flexible framework. License: Sun
License
- PortMusic-
PortMusic
is a set of APIs and library implementations for music. PortMusic
consists of three libraries: PortAudio
for real-time audio input/output, PortMidi
for real-time MIDI input/output and PortSoundFile
for sound file input and output.. License:
BSD-like open source.
- JID3 - JID3 is a Java
library for processing MP3 metadata (aka tags). A Supports
reading and writing ID3 V1.0 V1.1 and V2.3.0 MP3 Tags. License: GNU
Library or Lesser General Public License (LGPL)
Audio / Signal Processing
- aubio - aubio is a library
(written in C) for audio labelling. The aim is to add these automatic
labelling features to other audio
softwares. Functions can be used offline in sound editors and software
samplers, or online in audio effects and virtual instruments.
Features include onset
detection, silence detection and pitch
detection. License: GNU
General Public License (GPL)
- Matlab -
a high-level technical computing language and interactive environment
for algorithm development, data visualization, data analysis, and
numerical computation. License: commercial
- Marsyas - Marsyas
is a software framework for rapid prototyping and experimentation with
computer audition applications with specific emphasis on Music Information Retrieval.
Marsyas provides a general, extensible and flexible architecture
that allows easy experimentation with algorithms and provides fast
performance that is useful in developing real time audio analysis
tools. A variety of existing building blocks that form the basis
of most published algorithms in Computer Audition are already available
as part of the package. Marsyas is written in C++ and Java and is
actively being developed by George Tzanetakis. License: GNU
General Public License (GPL)
- CLAM - a
full-fledged
software framework for research and
application development in the Audio and Music Domain. It offers a
conceptual model as well as tools for the analysis, synthesis and
transformation of audio signals. License:
GNU General
Public License
(GPL).
- Sphinx 3
- Sphinx-3 is an open source speech recognition system.
Sphinx 3.x is a recent implementation for speech-to-text
recognition, its main goal being speed improvements over the original
Sphinx-3 decoder. The front-end of Sphinx-3 is used by MIR
researchers to generate MFCC features. License:
BSD
License
- Cakewalk
Sonar- recording studio in a box. Multitrack recording,
editing,
mixing, and delivery. License:
Commercial.
- HTK - The Hidden
Markov Model Toolkit (HTK) is a portable toolkit for
building and manipulating hidden Markov models. HTK is primarily used
for speech recognition research although it has been used for numerous
other applications including research into speech synthesis, character
recognition and DNA sequencing. License: Free download but
not redistributable
- sox - SoX
is
a command line utility that can convert various formats
of computer audio files in to other formats. It
can also apply various effects to these sound files during
the conversion. License:
GNU
Library or Lesser General Public License (LGPL)
- Audacity - Audacity
is a free audio editor. You can record sounds, play sounds, import and
export WAV, AIFF, Ogg Vorbis, and MP3 files, and more. Use it to edit
your sounds using Cut, Copy and Paste (with unlimited Undo), mix tracks
together, or apply effects to your recordings. It also has a built-in
amplitude envelope editor, a customizable spectrogram mode and a
frequency analysis window for audio analysis applications. It also
supports VST and LADSPA plug-in effects. License: GNU
General Public License (GPL)
- Nyquist -
Nyquist is an open-source language for sound analysis and synthesis. It
is implemented in C and C++ and runs on Win32, OS X, and Linux. Nyquist
offers a powerful and efficient functional programming model for signal
processing, and is particularly good at working with large amounts of
data because it automatically streams data rather than allocating large
arrays in primary memory. In addition to audio processing, Nyquist
offers a full Lisp interpreter and MIDI input/output making it suitable
for general purpose programming. License:
BSD-like open source.
- PureData - PureData is a
real-time graphical programming environment for audio, video, and
graphical processing. License: BSD-like
open source.
- SNDAN
-SNDAN is an open source collection of programs for spectral analysis,
display, modification, and resynthesis of musical sounds. It runs under
Unix or Linux. It includes phase-vocoder analysis which may be tuned to
any fundamental frequency (pitch) and frequency-tracking analysis which
performs analysis of sounds with highly variable pitch. It also
includes a pitch detector for plotting musical pitch vs. time.
Documentation is included. Also, two different versions of SNDAN for
Windows/DOS are available by other parties. License: free for download after
registration by email.
- Armadillo
- Armadillo is a spectral analysis/visualization program for the
Macintosh computer. It runs native under OS 8.x/9.x or under Classic in
OS 10.x. Analysis by phase vocoder can be performed in real time or out
of real time and can be untuned or tuned to a specific fundamental
frequency (pitch). Visual panels are 1D (amplitude vs. frequency),
"waterfall" (amplitude vs. frequency overlay), 2D spectrogram
(frequency vs. time), 3D (amplitude vs. frequency vs. time), and
waveform (amplitude vs. time) and can be run simultaneously. Tutorials
are provided. License: free
for download.
- Music
4C - This is an open source program for designing synthesis
algorithms in the C language and performing an orchestra of instruments
using a numerical score. It runs under Unix or Linux. Scores can be
generated from music-notation-like score files. Orchestras are provided
that play several simple instruments as well as sample files and
spectral analysis files. A tutorial manual is included. License: free for download after
registration by email.
- Pratt - Praat
is a program for speech analysis and synthesis - License: GNU General Public License
(GPL)
- GoldWave - a digital audio
editor. It includes all of the common audio editing commands and
effects, plus built-in tools such as a batch processor/converter,
a CD reader, and audio restoration filters. License: commercial
Visualization
- Matlab -
a high-level technical computing language and interactive environment
for algorithm development, data visualization, data analysis, and
numerical computation. License: commercial
- Octave - a high-level
language, primarily intended for numerical
computations. It provides a convenient command line interface for
solving linear and nonlinear problems numerically, and for performing
other numerical experiments using a language that is mostly compatible
with Matlab. It may also be used as a batch-oriented language. License: GNU General Public License
(GPL)
- The MIDI Toolbox
- a compilation of functions for analyzing and visualizing MIDI files
in the Matlab computing
environment. Besides
simple manipulation and filtering functions, the toolbox contains
cognitively inspired analytic techniques that are suitable for
context-dependent musical analysis that deal with such topics as
melodic contour, similarity, key-finding, meter-finding and
segmentation. License: GNU General Public
License
- Adobe
Audition - (Formerly Cooledit) -an audio
editing environment that offers advanced recording, mixing, editing,
and effects processing capabilities. License: Commercial
- ipe - drawing editor for
creating figures in PDF or (encapsulated)
Postscript format. It supports making small figures for inclusion
into LaTeX-documents as well as making multi-page PDF presentations
that can be shown on-line with Acrobat Reader. License: GNU General Public License
- Excel
- Microsoft's spreadsheet. License:
Commercial
- freeGLUT / openGL - freeglut is a completely
OpenSourced alternative to the OpenGL Utility Toolkit (GLUT) library.
GLUT (and hence freeglut) allows the user to create and manage windows
containing OpenGL contexts on a wide range of platforms and also read
the mouse, keyboard and joystick functions. License: MIT
License
- qwt - a graphics
extension to the Qt GUI application framework/ It provides a 2D
plotting widget and more. License:
Qwt
License, Version
1.0.
- qwtplot3d - a
feature-rich Qt/OpenGL-based
C++
programming
library that provides
essentially
a set of 3D-widgets for programmers. License: GNU
Library or Lesser General Public License (LGPL)
- qt
- a complete C++ application development framework, which includes a
class library and tools for cross-platform development and
internationalization. License: Commercial
- MFC
- a collection of classes (generalized definitions used in
object-oriented programming) that can be used in building application programs. The classes in the MFC
Library are written in the C++ programming language. License: Commercial
- wxWidgets
- class library
that allows
you to compile graphical C++ programs on a range of
different platforms. wxWidgets defines a common API across platforms,
but uses the native graphical user interface (GUI) on each platform,
so your program will take on the native 'look and feel' that users are
familiar with. License: Modified Library General Public
License
- wxDesigner
- A commercial
dialog editor and RAD tool for the free
wxWidgets GUI library. License:
Commercial
- JfreeChart - JFeeChart is
a free Java class library for generating charts, including: pie charts
(2D and 3D); bar charts (regular and stacked, with an optional 3D
effect); line and area charts; scatter plots and bubble charts; time
series, high/low/open/close charts and candle stick charts; combination
charts; Pareto charts; Gantt charts; wind plots, meter charts and
symbol charts; wafer map charts; License:
GNU
Library or Lesser General Public License (LGPL)
Algorithm Design / Prototyping
- Matlab -
a high-level technical computing language and interactive environment
for algorithm development, data visualization, data analysis, and
numerical computation. License: commercial
- Octave - a high-level
language, primarily intended for numerical
computations. It provides a convenient command line interface for
solving linear and nonlinear problems numerically, and for performing
other numerical experiments using a language that is mostly compatible
with Matlab. It may also be used as a batch-oriented language. License: GNU General Public License
(GPL)
- M2K - M2K
represents the music-specific set of D2K modules designed
to create a Virtual Research Lab (VRL) for MIR/MDL development,
prototyping and evaluation. M2K provides the framework for the
MIREX (Music Information
Retrieval Evaluation eXchange) contest, an annual MIR evaluation. D2K,
together with a subsidiary set of modules called T2K
(Text-to-Knowledge), provide the basic foundation upon which M2K is
being developed. D2K/T2K are the result of a ongoing research and
development project of the Automated Learning Group (ALG) at NCSA. M2K License:
BSD-Like
- LabWindows
- a C programming and development environment. It
includes toolkits for digital signal processing, UI design, data
analysis and visualization. License:
Commercial
Parallel Processing
- LAM/MPI: - a
high-quality open-source implementation of the Message Passing Interface
specification, including all of MPI-1.2 and much of MPI-2. Intended for
production as well as research use, LAM/MPI includes a rich set of
features for system administrators, parallel programmers, application
users, and parallel computing researchers. License: BSD-Style license
- PBS - Portable
Batch System - a flexible batch queuing system developed for NASA in the early to mid-1990s.
It operates on networked, multi-platform UNIX environments. License: Software License
- Linux
Cluster - Beowulf.org,
Beowulf Clusters are scalable performance clusters
based on commodity hardware, on a private system network, with open
source software (Linux) infrastructure. The designer can improve
performance proportionally with added machines. The commodity hardware
can be any of a number of mass-market, stand-alone compute nodes as
simple as two networked computers each running Linux and sharing a file
system or as complex as 1024 nodes with a high-speed, low-latency
network. License: GNU
General Public License
General Audio and Music Processing Resources
- Harmony
Central - an excellent source of audio programming tools and
resources
- SoftSynth
- a wide variety of computer music links
- FreshMeat -
Freshmeat's Sound/Audio software category lists more than 200 varied
applications dealing with audio and MIDI.
Developer Tools
MIR researchers use a wide range of programming tools:
- Operating Systems: Linux,
OS X, Solaris, Windows
- Programming Languages: C,
C++, C#, Delphi, Java, Perl, TCL/TK, Matlab, excel, awk
- Database: MySQL,
VisualFoxPro
- Documentation: Latex,
StarOffice, Word
- Repository: SourceForge.net
- Editors / IDE / Misc developer
tools: emacs,
vi, visual studio, eclipse, netbeans, jbuilder, gnuplot, autoconf
- Web Tools: apache,
java-script, cgi-bin, servlets/JSPs, typo3
Disclaimer: Links,
descriptions and license info may be wrong. Use at your
own risk.
Version History:
Version
1.0 - November 05, 2004
Version
1.1 - November 07, 2004. Added Auditory Toolbox, Netlab toolbox, SOM
toolbox, MA toolbox, BayesNet Matlab toolbox
Version 1.2 - November 15, 2004. Added Audacity, Nyquist and Port
Version 1.3 - December 7, 2004. Added aubio, PureData, SuperCollider,
HTK
Version 1.4 - December 13, 2004. Added M2K, fixed some typos and
formatting problems
Version 1.5 - May 9, 2005, added a number of new tools (six months of
submissions)