Fix outdated AudioSet link, shared memory, and TypeError#5
Open
ayaadev wants to merge 4 commits into
Open
Conversation
Pin the link to a specific revision of the HuggingFace repository.
This fixes an issue where the container runs out of memory
This fixes the following error. Method one was used. TypeError: Descriptors cannot be created directly. If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0. If you cannot immediately regenerate your protos, some other possible workarounds are: 1. Downgrade the protobuf package to 3.20.x or lower. 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The link for the AudioSet download is now outdated as of October 16th, 2025 with the commit "Convert to Parquet format". See the breaking commit here: https://huggingface.co/datasets/agkphysics/AudioSet/commit/0c609e8302cf139307f639c57652032af0a88041).
I've pinned the link to a specific revision where the
bal_train09.tarfile was still present. I've tested this on my install, and thesetup-data.shfile works now.Additionally, when running the
train.pyfile, I got an error saying that the container ran out of shared memory. I fixed this by addingipc: "host"to the Docker containers.I also encountered the following TypeError, so I followed the first outlined method and pinned the
protobufpackage to3.20.xin therequirements.txtfile.When the model finished training, I was originally alarmed to see that the model file was only roughly 400K and I thought there was an error. However, after testing, I could see that it was working as intended. Therefore, I've added a line to the documentation that conveys this more clearly to the reader.
With all these changes, I was able to train my wake word successfully!
Thanks for an amazing project. This is the only project that actually worked to generate a wake word.
Regards.