Skip to content

Fix code discrepancies encountered during the open-sourcing process#6

Open
hyp1231 wants to merge 13 commits into
google-deepmind:mainfrom
hyp1231:main
Open

Fix code discrepancies encountered during the open-sourcing process#6
hyp1231 wants to merge 13 commits into
google-deepmind:mainfrom
hyp1231:main

Conversation

@hyp1231

@hyp1231 hyp1231 commented Sep 8, 2025

Copy link
Copy Markdown

Dataset Handling and Parsing Improvements:

  • Improved robustness of the parse_gz function in genrec/datasets/AmazonReviews2014/dataset.py to handle malformed JSON lines.
  • Fixed usage of tqdm and function calls in metadata extraction and loading.

Configuration and Pipeline Updates:

  • Updated default configuration in genrec/default.yaml to increase eval_batch_size to 128 and patience for early stopping to 50 epochs.
  • Changed the sent_emb_pca parameter in genrec/models/ActionPiece/config.yaml to -1, disabling PCA by default.
  • Updated typing in genrec/pipeline.py to use Union for constructor parameters.

Model and Tokenizer Fixes:

  • Fixed a bug in ActionPieceTokenizer by removing the unused n_prob_encode_plus parameter and correcting a config key from rq_n_codebooks to pq_n_codebooks.
  • Added a batch shape check in the generate method of the ActionPiece model to prevent shape mismatches during ensemble inference.

Utility Function Enhancements:

  • Improved the log function in genrec/utils.py.
  • Enhanced _convert_value in genrec/utils.py to more robustly parse complex types using eval with safety checks.
  • Fixed dynamic loading in get_tokenizer and get_trainer to correctly return the desired class or a fallback.

Code Cleanup:

  • Removed unused imports in several files, including genrec/dataset.py and genrec/utils.py.

@google-cla

google-cla Bot commented Sep 8, 2025

Copy link
Copy Markdown

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@koalaaaaaaaaa

Copy link
Copy Markdown

Hi, taking all above fixes into account, I still find a bug in parameters. To be exact, many params loaded from config are lists which are supposed to be single float or str, I hope this can be fixed too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants