Skip to content

[FEATURE]: Skip pretasks already completed against the target hashlist when assigning a supertask #2167

@linuxkd

Description

@linuxkd

Description

When the same supertask (or a different supertask sharing some pretasks) is assigned to a hashlist that has already had those exact attacks fully exhausted, Hashtopolis re-instantiates every pretask as a new Task. The duplicate tasks then re-run the same attackCmd + same files + same cracker against the same hashes, burning GPU-hours for zero new coverage.

This is easy to hit in practice with phased workflows: a "quick" supertask runs to completion → no cracks → operator follows up with a broader "deep" supertask that includes the quick pretasks as a subset. Today, the only way to avoid the redundant runs is to manually identify duplicates and archive the tasks post-assignment (we just did this for a hashlist in our deployment — 9 of 40 tasks in the second wrapper were exact duplicates of completed tasks in the first).

Proposed behavior

Add an opt-in "Skip pretasks already completed against this hashlist" option at supertask → hashlist assignment time (and pretask → hashlist assignment, for consistency).

  • Default: OFF — preserves current behavior for all existing API consumers and UI flows.
  • When enabled, for each pretask in the supertask, check whether an equivalent fully-exhausted task already exists on the target hashlist. If yes, skip instantiation.
  • Surface the result on the resulting TaskWrapper page: "Skipped N pretasks (already completed): ".

Match criteria — what counts as "the same attack"

A pretask P is considered an exact duplicate of an existing Task T on hashlist H if all of the following hold:

  1. T.attackCmd == P.attackCmd (normalized — whitespace / rule-flag ordering, if relevant)
  2. The set of fileIds associated with T (via FileTask) equals the set associated with P (via FilePretask).
  3. T.crackerBinaryId == P.crackerBinaryId AND T.crackerBinaryTypeId == P.crackerBinaryTypeId (a hashcat version change invalidates the dedup — conservative).
  4. T.keyspace > 0 AND T.keyspaceProgress >= T.keyspace (fully exhausted; partials are NOT skipped — the remaining keyspace is still valuable work).
  5. T.taskWrapper.hashlistId == H.

Archived tasks (T.isArchived = 1) count as matches if they meet the above — an archived-but-exhausted task already did its work.

Explicitly NOT skipped

  • Partial tasks (killed agents, cancelled mid-run, hashcat crashes) — re-running finishes the remainder.
  • Tasks with keyspace = 0 (never started / never benchmarked).
  • Tasks still in progress on another wrapper (only completed matches qualify).
  • Tasks with a different cracker binary id/type.
  • Tasks where any file in the set differs by fileId.

Centralized match function

To keep this consistent across call sites, the matching logic should live in a single utility (e.g., TaskUtils::findCompletedEquivalent($hashlistId, $pretaskOrSpec, $crackerBinaryId)), called from:

  • Supertask → hashlist assignment (this issue's primary surface)
  • Pretask → hashlist assignment
  • Future read-only surfaces (see "Out of scope for v1" below)

A complementary TaskUtils::listCompletedAttacks($hashlistId) would return the distinct set of (attackCmd, fileIdSet, crackerBinaryId) tuples fully exhausted against a hashlist — useful both as a read-only operator view and as the inverse of the dedup query.

Backend sketch

function assignSupertaskToHashlist($supertaskId, $hashlistId, $crackerBinaryId, $skipCompleted = false):
    wrapper = TaskWrapper::create(...)
    skipped = []
    for pretask in supertask.pretasks:
        if skipCompleted:
            existing = TaskUtils::findCompletedEquivalent(hashlistId, pretask, crackerBinaryId)
            if existing:
                skipped.append({pretaskId: pretask.id, matchingTaskId: existing.id})
                continue
        Task::createFromPretask(pretask, wrapper, crackerBinaryId)
    if all_skipped:
        return warning("All pretasks already completed against this hashlist. Uncheck 'Skip pretasks already completed' to force re-run.")
    return {wrapperId: wrapper.id, skippedPretasks: skipped}

Per-pretask DB lookup at assignment time only — not on a hot path.

Known limitations / accepted risks for v1

These are gaps to document in the option's tooltip / docs, deliberately not blocking v1:

  • Hashes added since the prior run. If new hashes were added to the hashlist between the original task's completion and this assignment, the dedup is wrong — those new hashes need the attack. The current schema doesn't expose a Hashlist.lastHashAddedAt to compare against. v1 tooltip should call this out: "This option assumes the hashlist's hash set has not changed since the prior run. If you've added hashes, leave it unchecked." v2 could add a lastHashAddedAt column and gate dedup on that timestamp vs the prior task's last chunk activity.
  • File content mutation under the same fileId. Compares fileId, not file content. If a wordlist's bytes were replaced (same fileId, new content), dedup would incorrectly skip. v2 could compare File.size + a content hash. v1 trusts fileId.
  • Superhashlist assignment is out of scope (see "Out of scope for v1").

Out of scope for v1

To keep the initial PR reviewable, the following are explicitly not part of this issue. They would build on the same centralized match function:

  • Manual task creation form — non-blocking warning when the form is filled in with an attack that already exhausted against the selected hashlist.
  • Hashlist detail "Completed attacks" view — read-only list of distinct completed attacks.
  • Pretask / supertask detail "Coverage" view — which hashlists this pretask/supertask has been fully completed against.
  • Superhashlist semantics — dedup against component hashlists. Related to [ENHANCEMENT]: Add superhashlist to supertask #1294.
  • Hashes-added-since detection — see Known Limitations above.
  • File-content-mutation detection — see Known Limitations above.

Related issues

Compatibility

  • API v2: new optional skipAlreadyCompleted boolean on the supertask-assignment / pretask-assignment endpoints; default false.
  • UI: new checkbox on assignment dialog, unchecked by default. Tooltip explains semantics and limitations.
  • No breaking changes to existing behavior, API responses, or DB schema (the match function is read-only; only new Task rows are affected, and only when the option is opted in).

Metadata

Metadata

Assignees

No one assigned

    Labels

    new featureNew feature to be addedserverHashtopolis API/Server relateduiHashtopolis UI related

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions