Skip to content

Extraneous Lines in samples.txt #13

Merged
standage merged 2 commits into
mainfrom
duplicate_samples_txt
Apr 3, 2026
Merged

Extraneous Lines in samples.txt #13
standage merged 2 commits into
mainfrom
duplicate_samples_txt

Conversation

@RyanBerger98

@RyanBerger98 RyanBerger98 commented Apr 2, 2026

Copy link
Copy Markdown
Collaborator

This PR fixes a bug recently encountered.

When running ezfastq a samples.txt file will automatically be generated for the user containing the sample names of all the files that have been copied over. ezfastq will append sample names to this file rather than overwrite it if ezfastq is ran multiple times with the same working directory. ezfastq avoids writing duplicate sample names to the samples.txt file. However, if ezfastq is ran multiple times with the same samples, empty lines will be appended to the file.

Closes #12

@RyanBerger98 RyanBerger98 left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@standage this is ready for review!

Comment thread ezfastq/api.py
Comment on lines +38 to +40
if len(added_samples) > 0:
with open(workdir / "samples.txt", "a") as fh:
print(*added_samples, sep="\n", file=fh)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the fix. Only open samples.txt to write sample names if the number of samples copied over is greater than 0.

Comment thread ezfastq/tests/test_cli.py
Comment on lines +91 to +101
def test_duplicate_samples(tmp_path):
seq_path = files("ezfastq") / "tests" / "data" / "flat"
arglist = [seq_path, "test1", "test2", "test3", "--workdir", tmp_path]
cli.main(arglist)
with open(tmp_path / "samples.txt", "r") as fh:
num_lines = len(fh.readlines())
assert num_lines == 3
cli.main(arglist)
with open(tmp_path / "samples.txt", "r") as fh:
num_lines = len(fh.readlines())
assert num_lines == 3

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regression test. This fails on the main branch but passes here.

@RyanBerger98 RyanBerger98 requested a review from standage April 2, 2026 12:53

@RyanBerger98 RyanBerger98 left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@standage now this is ready for review

Comment thread ezfastq/namemap.py
Comment on lines +25 to +27
if line.strip():
old_name, new_name = cls.parse_name(line, sep="\t")
name_map[old_name] = new_name

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix for #12. Checks that line actually has content after whitespace has been stripped.

Comment thread ezfastq/tests/test_cli.py
Comment on lines 70 to 89
@@ -86,3 +87,16 @@ def test_fq_command(tmp_path):
arglist = ["ezfastq", seq_path, "test1", "test2", "test3", "--workdir", tmp_path]
run(arglist)
assert len(list((tmp_path / "seq").glob("*_R?.fastq.gz"))) == 6

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated test to ensure that empty lines in input sample file doesn't break ezfastq

@standage standage left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@standage standage merged commit 2bc7267 into main Apr 3, 2026
4 checks passed
@standage standage deleted the duplicate_samples_txt branch April 3, 2026 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issues with samples.txt handling

2 participants