Skip to content

modify pytaxonkit.py to handle taxonkit's fuzzy matching#53

Merged
standage merged 9 commits into
bioforensics:masterfrom
DavidBSauer:fuzzy_testing
Apr 6, 2026
Merged

modify pytaxonkit.py to handle taxonkit's fuzzy matching#53
standage merged 9 commits into
bioforensics:masterfrom
DavidBSauer:fuzzy_testing

Conversation

@DavidBSauer

Copy link
Copy Markdown
Contributor

Hi,
I've made a modification to name2taxid to support taxonkit's fuzzymatching (available since 0.17.0).

ps: Please let me know how I can improve. This is my first attempt at a pull request. Aside from modifying the code, I am unfamiliar with the contribution/merge process.

Best wishes,
David

@DavidBSauer

DavidBSauer commented Apr 2, 2026

Copy link
Copy Markdown
Contributor Author

Also, I've added an additional function to validate the number of fuzzy returns (validate_n) based on the existing validate_threads.

Two thoughts

  • These could be combined in principle, as they are both testing for valid integers >0. But need a way to record which variable has the problem if there is an error.
  • As-is, I think these functions have a problem. They're meant to throw a warning and cause taxonkit to use its default value. However, these functions return a None. This cannot be converted to a string, so it will instead cause an error.

@standage

standage commented Apr 2, 2026

Copy link
Copy Markdown
Member

Hi @DavidBSauer! Thanks for your interest and your contribution. Your code looks pretty good after a brief first glance, but I'll want to do a bit of testing before I can make any conclusive comments. It may be a couple of days before I can get to this. Thanks for your patience, and stand by!

@standage

standage commented Apr 2, 2026

Copy link
Copy Markdown
Member

In the meantime, I just merged a minor update to fix a test that started failing recently due to an update to the NCBI taxonomy database. I'll go ahead and approve the test suite to run for this PR, but it will likely fail until you rebase.

@standage

standage commented Apr 2, 2026

Copy link
Copy Markdown
Member

Welp, it looks like the CI is failing on a style check before even running the test suite. If you run make format in your dev environment it will autoformat the code for you.

@DavidBSauer

Copy link
Copy Markdown
Contributor Author

Hi @standage think I've got it reformatted with "make format". Please try this version

@standage

standage commented Apr 2, 2026

Copy link
Copy Markdown
Member

I'm not sure how running make format ended up making so many changes, including to files it typically doesn't touch. But I was able to reset to the version of your code at e4068ea and run make format to fix the very small style issues. I also merged the latest updates to the master branch into this branch. Evaluation of the substance of this PR is still pending!

@standage

standage commented Apr 6, 2026

Copy link
Copy Markdown
Member

I tested the feature on the example provided in shenwei356/taxonkit#88 when the feature was first implemented in TaxonKit. Looks good to me! I went ahead and codified this example as a test in the automated test suite.

Thanks for this contribution @DavidBSauer!

@standage standage merged commit c001db7 into bioforensics:master Apr 6, 2026
4 checks passed
@DavidBSauer DavidBSauer deleted the fuzzy_testing branch April 10, 2026 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants