Skip to content

Fix outdated AudioSet link, shared memory, and TypeError#5

Open
ayaadev wants to merge 4 commits into
CoreWorxLab:mainfrom
ayaadev:main
Open

Fix outdated AudioSet link, shared memory, and TypeError#5
ayaadev wants to merge 4 commits into
CoreWorxLab:mainfrom
ayaadev:main

Conversation

@ayaadev

@ayaadev ayaadev commented Mar 27, 2026

Copy link
Copy Markdown

The link for the AudioSet download is now outdated as of October 16th, 2025 with the commit "Convert to Parquet format". See the breaking commit here: https://huggingface.co/datasets/agkphysics/AudioSet/commit/0c609e8302cf139307f639c57652032af0a88041).

I've pinned the link to a specific revision where the bal_train09.tar file was still present. I've tested this on my install, and the setup-data.sh file works now.

Additionally, when running the train.py file, I got an error saying that the container ran out of shared memory. I fixed this by adding ipc: "host" to the Docker containers.

I also encountered the following TypeError, so I followed the first outlined method and pinned the protobuf package to 3.20.x in the requirements.txt file.

TypeError: Descriptors cannot be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

When the model finished training, I was originally alarmed to see that the model file was only roughly 400K and I thought there was an error. However, after testing, I could see that it was working as intended. Therefore, I've added a line to the documentation that conveys this more clearly to the reader.

With all these changes, I was able to train my wake word successfully!

Thanks for an amazing project. This is the only project that actually worked to generate a wake word.

Regards.

ayaadev added 3 commits March 27, 2026 11:14
Pin the link to a specific revision of the HuggingFace repository.
This fixes an issue where the container runs out of memory
This fixes the following error. Method one was used.

TypeError: Descriptors cannot be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
@ayaadev ayaadev changed the title Fix outdated AudioSet link Fix outdated AudioSet link, shared memory, and TypeError Mar 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant