NeuralAudio is a C++ library designed to make it easy to use neural network machine learning models (ie: guitar amplifier captures/profiles) in real-time audio applications.
NeuralAudio currently supports the following model types:
- Neural Amp Modeler (NAM) WaveNet and LSTM models, A1 and A2 support
- RTNeural keras models (LSTM, GRU)
This repository is licensed under the MIT license. It is a liberal license, but please make sure that you comply with the terms - as well as the terms of this project's dependencies. I would also appreciate it if you would let me know if you are using this library.
By default, NeuralAudio uses its own implementation of WaveNet and LSTM network models. This implementation has been designed to produce exactly the same output as the NAM Core library, but with increased performance and reduced memory usage.
For completeness, and to facilitate accuracy and performance benchmarking, it can also load models using the NAM Core implementation and RTNeural.
The internal NeuralAudio implmentation currently outperforms the other implementations on all tested platforms (Windows x64, Linux x64/Arm64). It also uses significantly less memory than the NAM Core WaveNet implementation (which, for example, uses about 10x as much memory for a "standard" WaveNet model).
For WaveNet, the internal implmeentation supports optimized static models of the offical NAM A1 network architectures: "Standard", "Lite", "Feather", "Nano".
For LSTM, the internal implementation supports optimized static models architectures for 1x8, 1x12, 1x16, 1x24, 2x8, 2x12, and 2x16 models.
All NAM files with WaveNet and LSTM architectures not supported internally will fall back on a less performant dynamic implementation (although still faster than NAM Core).
All keras models not supported internally will fall back to the RTNeural implmentation.
NAM A2 models currently use the NAM Core implementation.
Models are loaded with a model loader:
NeuralModelLoader loader;
NeuralModel* model = loader.CreateFromFile("<path to model file>");
To process a model:
model->Process(pointerToFloatInputData, pointerToFloatOutputData, int numSamples);
Some models need to allocate memory based on the size of the audio buffers being used. You need to make sure that processing does not exceed the specified maximum buffer size.
The default maximum size is 128 samples. To change it, change the default size on the model loader:
loader.SetDefaultMaxAudioBufferSize(maxSize);
if you want to change the maximum buffer size of an already created model, do:
model->SetMaxAudioBufferSize(int maxSize);
Note: this is not real-time safe, and should not be done on a real-time audio thread.
Use model->GetRecommendedInputDBAdjustment() and model->GetRecommendedOutputDBAdjustment() to obtain the ideal input and output volume level adjustments in dB.
To set a known audio input level (ie: from an audio interface), use loader.SetAudioInputLevelDBu(float audioDBu). This is set at 12DBu by default.
By default, models are loaded using the internal NeuralAudio implementation. If you would like to force the use of the NAM Core or RTNeural implementations, you can use:
loader.SetWaveNetLoadMode(loadMode);
and
loader.SetLSTMLoadMode(loadMode);
where "loadMode" is one of:
NeuralAudio::EModelLoadMode::Internal
NeuralAudio::EModelLoadMode::NAMCore
NeuralAudio::EModelLoadMode::RTNeural
You can check which implementation was actually used to load the model with model->GetLoadMode().
NOTE: Because of compile time and executable size considerations, only the internal, NAM Core and dynamic RTNeural implementations are built by default. If you want to use RTNeural, it is recommended that you add -DBUILD_STATIC_RTNEURAL=ON to your cmake commandline. This will create static model implmentations for the same sets of WaveNet and LSTM models as the internal implmentation, and results in increased performance. Interal static LSTM model support is also off by default - to turn it on use -DBUILD_INTERNAL_STATIC_LSTM=ON.
Some models (notably NAM A2 models) are comprised of multiple sub-models. By default, all sub-models will be fully loaded and initialized when the model is loaded.
If you wish to avoid the overhead of initializing unused models and only initialize the active model on load, you can do:
loader->SetCompositeModelLoadMode(ECompositeModelLoadMode::OnDemand);
Note that this means that switching to a different model for the first time via quality scaling will not be realtime safe.
Some models (notably, slimmable NAM A2 models) support quality scaling - trading off quality for performance.
Quality scaling is a floating point range from 0.0 (highest performance) to 1.0 (highest quality).
To set the default quality scaling factor, set it on the loader:
loader.SetDefaultQualityScaleFactor(scaleFactor);
To check if a model supports quality scaling, do:
if (model->HasQualityScaling()) ...
To set the quality scaling factor for a loaded model, do:
model->SetQualityScaleFactor(scaleFactor);
Note: This operation is not real-time safe if the quality scale factor results in switching to an uninitialized model. If you are using the default composite model loading behavior, setting the quality scale factor is always real-time safe. If you are using "OnDemand" composite model loading, you can check whether a quality scale change is real-time safe by doing:
if (!model->IsQualityChangeRealtimeSafe(newScaleFactor)
{
(call SetQualityScaleFactor(), but ensure it is not done in a real-time context)
}
To get the current quality scaling factor for a model, do:
float scaleFactor = model->GetQualityScaleFactor();
To retrieve arbitrary metadata fields from models that contain them, do:
std::string fieldName = "this_is_a_field_name";
std::string metadataValue = model->GetMetadata(fieldName);
if (!metadataValue.empty())
{
// do something
}
Results are always returned a strings. Field names are case sensitive.
To get the model version string, do:
std::string version = model->GetModelVersion()
The string will be empty if no version information exists.
WaveNet models have a fixed receptive field size (ie: size of the input that the output depends on).
To get this value, do:
int receptiveFieldSamples = model->GetReceptiveFieldSize();
Note that this can return -1, which means that the receptive field size is unknown, or not fixed (ie: LSTM models technically have an infinite tail because of their internal feedback loop).
This method is only supported for "internal" and NAM Core models. For RTNeural it will always return -1.
First clone the repository:
git clone --recurse-submodules https://github.com/mikeoliphant/NeuralAudio
cd NeuralAudio/buildThen compile the plugin using:
Linux/MacOS
cmake .. -DCMAKE_BUILD_TYPE="Release"
make -j4Windows
cmake.exe -G "Visual Studio 17 2022" -A x64 ..
cmake --build . --config=release -j4Note - you'll have to change the Visual Studio version if you are using a different one.
-DBUILD_NAMCORE=ON|OFF: Support loading models using the NAM Core implemenations.
-DNAM_USE_INLINE_GEMM=ON|OFF: Enable use of inline matrix multiplication in NAM Core.
-DNAM_ENABLE_A2_FAST=ON|OFF: Enable use of A2 fast path wavenet in NAM Core.
-DBUILD_STATIC_RTNEURAL=ON|OFF: Build static RTNeural model architectures (slower compile, larger size - only use if you plan on forcing RTNeural model loading).
-DBUILD_INTERNAL_STATIC_WAVENET=ON|OFF: Build internal static WaveNet model architectures (faster internal WaveNet, but slower compile, larger size).
-DBUILD_INTERNAL_STATIC_LSTM=ON|OFF: Build internal static LSTM model architectures (faster internal LSTM, but slower compile, larger size).
-DDEFAULT_QUALITY_SCALE="X.X": Default model quality scale factor (0.0 to 1.0). Be sure to use quotes around value. Defaults to "1.0".
-DDEFAULT_INPUT_DBU="XX": Default dBu level for model input calibration.
-DWAVENET_FRAMES=XXX: Sample buffer size for the internal WaveNet implementation. Defaults to 64. If you know you are going to be using a fixed sample buffer smaller or larger than this, use that instead. Note that the model will still be able to process any buffer size - it is just optimized for this size.
-DBUFFER_PADDING=XXX: Amount of padding to convolution layer buffers. This allows ring buffer resets to be staggered accross layers to improve performance. It also uses a significant amount of memory. It is set to 24 by default. It can be set all the way down to 0 to reduce memory usage.
-DWAVENET_MATH=XXX
-DLSTM_MATH=XXX: Which math approximations (tanh and sigmoid) to use for WaveNet and LSTM models. Options are:
FastMath(the default): Use the same approximations as NAM Core.EigenMath: Use Eigen's builtin tanh approximation. Somewhat slower, but more accurate.StdMath: Use standard math functions. No approxmation used - much slower.
-DBUILD_UTILS=ON|OFF: Build performance/accuracy testing tools (located in the "Utils" folder).
The following applications and devices are using the NeuralAudio library for model processing:
- neural-amp-modeler-lv2: LV2 plugin for using neural network machine learning amp models.
- stompbox: Guitar amplification and effects pedalboard simulation.
- NeuralRack: Neural Model and Impulse Response File loader for Linux/Windows.
- Darkglass Anagram: Bass guitar effects unit.
- neural_tilde: Max/MSP external for running neural amplifier captures