Fix outdated AudioSet link, shared memory, and TypeError by ayaadev · Pull Request #5 · CoreWorxLab/openwakeword-training

ayaadev · 2026-03-27T11:18:09Z

The link for the AudioSet download is now outdated as of October 16th, 2025 with the commit "Convert to Parquet format". See the breaking commit here: https://huggingface.co/datasets/agkphysics/AudioSet/commit/0c609e8302cf139307f639c57652032af0a88041).

I've pinned the link to a specific revision where the bal_train09.tar file was still present. I've tested this on my install, and the setup-data.sh file works now.

Additionally, when running the train.py file, I got an error saying that the container ran out of shared memory. I fixed this by adding ipc: "host" to the Docker containers.

I also encountered the following TypeError, so I followed the first outlined method and pinned the protobuf package to 3.20.x in the requirements.txt file.

TypeError: Descriptors cannot be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

When the model finished training, I was originally alarmed to see that the model file was only roughly 400K and I thought there was an error. However, after testing, I could see that it was working as intended. Therefore, I've added a line to the documentation that conveys this more clearly to the reader.

With all these changes, I was able to train my wake word successfully!

Thanks for an amazing project. This is the only project that actually worked to generate a wake word.

Regards.

Pin the link to a specific revision of the HuggingFace repository.

This fixes an issue where the container runs out of memory

This fixes the following error. Method one was used. TypeError: Descriptors cannot be created directly. If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0. If you cannot immediately regenerate your protos, some other possible workarounds are: 1. Downgrade the protobuf package to 3.20.x or lower. 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

ayaadev added 3 commits March 27, 2026 11:14

Fix outdated AudioSet link

95b35a3

Pin the link to a specific revision of the HuggingFace repository.

Set IPC to host in Docker compose

3fb4f75

This fixes an issue where the container runs out of memory

ayaadev changed the title ~~Fix outdated AudioSet link~~ Fix outdated AudioSet link, shared memory, and TypeError Mar 28, 2026

Inform the user that the finished model could be small

7eb8627

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix outdated AudioSet link, shared memory, and TypeError#5

Fix outdated AudioSet link, shared memory, and TypeError#5
ayaadev wants to merge 4 commits into
CoreWorxLab:mainfrom
ayaadev:main

ayaadev commented Mar 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ayaadev commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ayaadev commented Mar 27, 2026 •

edited

Loading