You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the same supertask (or a different supertask sharing some pretasks) is assigned to a hashlist that has already had those exact attacks fully exhausted, Hashtopolis re-instantiates every pretask as a new Task. The duplicate tasks then re-run the same attackCmd + same files + same cracker against the same hashes, burning GPU-hours for zero new coverage.
This is easy to hit in practice with phased workflows: a "quick" supertask runs to completion → no cracks → operator follows up with a broader "deep" supertask that includes the quick pretasks as a subset. Today, the only way to avoid the redundant runs is to manually identify duplicates and archive the tasks post-assignment (we just did this for a hashlist in our deployment — 9 of 40 tasks in the second wrapper were exact duplicates of completed tasks in the first).
Proposed behavior
Add an opt-in "Skip pretasks already completed against this hashlist" option at supertask → hashlist assignment time (and pretask → hashlist assignment, for consistency).
Default: OFF — preserves current behavior for all existing API consumers and UI flows.
When enabled, for each pretask in the supertask, check whether an equivalent fully-exhausted task already exists on the target hashlist. If yes, skip instantiation.
Surface the result on the resulting TaskWrapper page: "Skipped N pretasks (already completed): ".
Match criteria — what counts as "the same attack"
A pretask P is considered an exact duplicate of an existing Task T on hashlist H if all of the following hold:
The set of fileIds associated with T (via FileTask) equals the set associated with P (via FilePretask).
T.crackerBinaryId == P.crackerBinaryId AND T.crackerBinaryTypeId == P.crackerBinaryTypeId (a hashcat version change invalidates the dedup — conservative).
T.keyspace > 0 AND T.keyspaceProgress >= T.keyspace (fully exhausted; partials are NOT skipped — the remaining keyspace is still valuable work).
T.taskWrapper.hashlistId == H.
Archived tasks (T.isArchived = 1) count as matches if they meet the above — an archived-but-exhausted task already did its work.
Tasks with keyspace = 0 (never started / never benchmarked).
Tasks still in progress on another wrapper (only completed matches qualify).
Tasks with a different cracker binary id/type.
Tasks where any file in the set differs by fileId.
Centralized match function
To keep this consistent across call sites, the matching logic should live in a single utility (e.g., TaskUtils::findCompletedEquivalent($hashlistId, $pretaskOrSpec, $crackerBinaryId)), called from:
Future read-only surfaces (see "Out of scope for v1" below)
A complementary TaskUtils::listCompletedAttacks($hashlistId) would return the distinct set of (attackCmd, fileIdSet, crackerBinaryId) tuples fully exhausted against a hashlist — useful both as a read-only operator view and as the inverse of the dedup query.
Backend sketch
function assignSupertaskToHashlist($supertaskId, $hashlistId, $crackerBinaryId, $skipCompleted = false):
wrapper = TaskWrapper::create(...)
skipped = []
for pretask in supertask.pretasks:
if skipCompleted:
existing = TaskUtils::findCompletedEquivalent(hashlistId, pretask, crackerBinaryId)
if existing:
skipped.append({pretaskId: pretask.id, matchingTaskId: existing.id})
continue
Task::createFromPretask(pretask, wrapper, crackerBinaryId)
if all_skipped:
return warning("All pretasks already completed against this hashlist. Uncheck 'Skip pretasks already completed' to force re-run.")
return {wrapperId: wrapper.id, skippedPretasks: skipped}
Per-pretask DB lookup at assignment time only — not on a hot path.
Known limitations / accepted risks for v1
These are gaps to document in the option's tooltip / docs, deliberately not blocking v1:
Hashes added since the prior run. If new hashes were added to the hashlist between the original task's completion and this assignment, the dedup is wrong — those new hashes need the attack. The current schema doesn't expose a Hashlist.lastHashAddedAt to compare against. v1 tooltip should call this out: "This option assumes the hashlist's hash set has not changed since the prior run. If you've added hashes, leave it unchecked." v2 could add a lastHashAddedAt column and gate dedup on that timestamp vs the prior task's last chunk activity.
File content mutation under the same fileId. Compares fileId, not file content. If a wordlist's bytes were replaced (same fileId, new content), dedup would incorrectly skip. v2 could compare File.size + a content hash. v1 trusts fileId.
Superhashlist assignment is out of scope (see "Out of scope for v1").
Out of scope for v1
To keep the initial PR reviewable, the following are explicitly not part of this issue. They would build on the same centralized match function:
Manual task creation form — non-blocking warning when the form is filled in with an attack that already exhausted against the selected hashlist.
Hashlist detail "Completed attacks" view — read-only list of distinct completed attacks.
Pretask / supertask detail "Coverage" view — which hashlists this pretask/supertask has been fully completed against.
API v2: new optional skipAlreadyCompleted boolean on the supertask-assignment / pretask-assignment endpoints; default false.
UI: new checkbox on assignment dialog, unchecked by default. Tooltip explains semantics and limitations.
No breaking changes to existing behavior, API responses, or DB schema (the match function is read-only; only new Task rows are affected, and only when the option is opted in).
Description
When the same supertask (or a different supertask sharing some pretasks) is assigned to a hashlist that has already had those exact attacks fully exhausted, Hashtopolis re-instantiates every pretask as a new Task. The duplicate tasks then re-run the same
attackCmd+ same files + same cracker against the same hashes, burning GPU-hours for zero new coverage.This is easy to hit in practice with phased workflows: a "quick" supertask runs to completion → no cracks → operator follows up with a broader "deep" supertask that includes the quick pretasks as a subset. Today, the only way to avoid the redundant runs is to manually identify duplicates and archive the tasks post-assignment (we just did this for a hashlist in our deployment — 9 of 40 tasks in the second wrapper were exact duplicates of completed tasks in the first).
Proposed behavior
Add an opt-in "Skip pretasks already completed against this hashlist" option at supertask → hashlist assignment time (and pretask → hashlist assignment, for consistency).
TaskWrapperpage: "Skipped N pretasks (already completed): ".Match criteria — what counts as "the same attack"
A pretask
Pis considered an exact duplicate of an existingTask Ton hashlistHif all of the following hold:T.attackCmd == P.attackCmd(normalized — whitespace / rule-flag ordering, if relevant)T(viaFileTask) equals the set associated withP(viaFilePretask).T.crackerBinaryId == P.crackerBinaryIdANDT.crackerBinaryTypeId == P.crackerBinaryTypeId(a hashcat version change invalidates the dedup — conservative).T.keyspace > 0ANDT.keyspaceProgress >= T.keyspace(fully exhausted; partials are NOT skipped — the remaining keyspace is still valuable work).T.taskWrapper.hashlistId == H.Archived tasks (
T.isArchived = 1) count as matches if they meet the above — an archived-but-exhausted task already did its work.Explicitly NOT skipped
keyspace = 0(never started / never benchmarked).Centralized match function
To keep this consistent across call sites, the matching logic should live in a single utility (e.g.,
TaskUtils::findCompletedEquivalent($hashlistId, $pretaskOrSpec, $crackerBinaryId)), called from:A complementary
TaskUtils::listCompletedAttacks($hashlistId)would return the distinct set of(attackCmd, fileIdSet, crackerBinaryId)tuples fully exhausted against a hashlist — useful both as a read-only operator view and as the inverse of the dedup query.Backend sketch
Per-pretask DB lookup at assignment time only — not on a hot path.
Known limitations / accepted risks for v1
These are gaps to document in the option's tooltip / docs, deliberately not blocking v1:
Hashlist.lastHashAddedAtto compare against. v1 tooltip should call this out: "This option assumes the hashlist's hash set has not changed since the prior run. If you've added hashes, leave it unchecked." v2 could add alastHashAddedAtcolumn and gate dedup on that timestamp vs the prior task's last chunk activity.File.size+ a content hash. v1 trusts fileId.Out of scope for v1
To keep the initial PR reviewable, the following are explicitly not part of this issue. They would build on the same centralized match function:
Related issues
TaskWrapper/Task/Supertaskmodel redesign (naming/structure, complementary)Compatibility
skipAlreadyCompletedboolean on the supertask-assignment / pretask-assignment endpoints; defaultfalse.