Skip to content

Fix name mangler to preserve definition when mangled name already exists#8930

Open
tautschnig wants to merge 2 commits into
diffblue:developfrom
tautschnig:fix-8254-local_bitvector_analysist
Open

Fix name mangler to preserve definition when mangled name already exists#8930
tautschnig wants to merge 2 commits into
diffblue:developfrom
tautschnig:fix-8254-local_bitvector_analysist

Conversation

@tautschnig

Copy link
Copy Markdown
Collaborator

When --export-file-local-symbols is used and files are compiled together, a forward declaration using the mangled name (e.g., __CPROVER_file_local_a_c_foo) may already exist in the symbol table and function map before the mangler runs. The mangler's symbol_table.insert() silently failed in this case, leaving the declaration's type (with empty parameter identifiers) in place instead of the definition's type. This caused an invariant violation in goto_symext::locality() when it tried to look up the empty parameter identifier.

Fix the mangler to:

  1. When insert fails and the existing symbol is a declaration (no body), update it with the definition's type/value. Log a warning when the existing symbol already has a definition.
  2. When the mangled name already has an empty function map entry, swap in the definition's body and parameter_identifiers.
  3. Rename parameter_identifiers when renaming functions (matching what link_goto_model already does).

Fixes: #8254

  • Each commit message has a non-empty body, explaining why the change was made.
  • n/a Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • n/a My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@tautschnig tautschnig self-assigned this Mar 30, 2026
Copilot AI review requested due to automatic review settings March 30, 2026 11:59

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a crash / invariant violation when --export-file-local-symbols is used and multiple C files are compiled in one goto-cc invocation, where a forward declaration of an already-mangled name can pre-exist in the symbol table and goto-function map.

Changes:

  • Update name_mangler to handle symbol-table insert collisions by replacing an existing declaration with the definition’s type/value.
  • Handle goto-function-map collisions by swapping in the real function body/parameter identifiers when the pre-existing entry is empty.
  • Add a regression test covering the multi-file compile-together scenario from #8254.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
src/goto-programs/name_mangler.h Adds collision-handling logic for symbol-table and goto-function-map entries during mangling.
regression/goto-cc-file-local/multi-file-compile-together/test.desc New regression driver asserting the correct failure result (no crash) for the reproducer.
regression/goto-cc-file-local/multi-file-compile-together/main.c Reproducer: forward-declares mangled name and calls it.
regression/goto-cc-file-local/multi-file-compile-together/a.c Reproducer: defines static foo with a failing assertion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 86 to +90
for(const auto &sym : new_syms)
model.symbol_table.insert(sym);
{
auto result = model.symbol_table.insert(sym);
if(!result.second)
{

Copilot AI Mar 30, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

old_syms stores iterators into model.symbol_table.symbols (an unordered_map). Inserting new_syms before erasing old_syms can trigger a rehash and invalidate those iterators, making the later erase(sym) undefined behaviour. Prefer storing the old symbol names (irep_idt) and erasing via a fresh find() after insertions, or erase old entries before inserting new ones (or reserve to prevent rehash).

Copilot uses AI. Check for mistakes.
Comment thread src/goto-programs/name_mangler.h Outdated
existing.type = sym.type;
existing.value = sym.value;
existing.is_file_local = sym.is_file_local;
existing.module = sym.module;

Copilot AI Mar 30, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating existing.module (and any other fields used by symbol_table indices) in-place breaks symbol_tablet's internal symbol_module_map invariants: the module/base-name maps are populated at insert-time and symbol_tablet::erase() assumes these fields never change. To keep the symbol table consistent, avoid mutating module on an already-inserted symbol; instead erase the existing entry (by iterator) and re-insert the updated symbol, or keep module unchanged.

Suggested change
existing.module = sym.module;
// Do not change existing.module: it participates in symbol_table indices.

Copilot uses AI. Check for mistakes.
Comment thread src/goto-programs/name_mangler.h Outdated
Comment on lines +91 to +96
symbolt &existing = result.first;
if(existing.value.is_nil())
{
// The existing symbol is a declaration (no body). This happens
// when user code forward-declares the mangled name. Replace it
// with the definition so that parameter identifiers and the

Copilot AI Mar 30, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The insert-collision handling treats any existing symbol with value.is_nil() as a forward declaration, but many non-function symbols can also have a nil value (e.g., uninitialised variables). Before overwriting, ensure the existing entry is actually a function declaration (e.g., existing.is_function() / existing.type.id()==ID_code) and otherwise emit a diagnostic rather than replacing an unrelated symbol.

Copilot uses AI. Check for mistakes.
@codecov

codecov Bot commented Mar 30, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 98.05195% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.61%. Comparing base (eec9b3e) to head (c94d2a0).

Files with missing lines Patch % Lines
unit/goto-programs/name_mangler.cpp 98.24% 2 Missing ⚠️
src/goto-programs/name_mangler.h 97.50% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #8930      +/-   ##
===========================================
+ Coverage    80.60%   80.61%   +0.01%     
===========================================
  Files         1711     1712       +1     
  Lines       189466   189611     +145     
  Branches        73       73              
===========================================
+ Hits        152722   152864     +142     
- Misses       36744    36747       +3     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

tautschnig and others added 2 commits June 10, 2026 20:29
When --export-file-local-symbols is used and files are compiled together,
a forward declaration using the mangled name (e.g.,
__CPROVER_file_local_a_c_foo) may already exist in the symbol table and
function map before the mangler runs. The mangler's symbol_table.insert()
silently failed in this case, leaving the declaration's type (with empty
parameter identifiers) in place instead of the definition's type. This
caused an invariant violation in goto_symext::locality() when it tried to
look up the empty parameter identifier.

Fix the mangler to:
1. When insert fails and the existing symbol is a function declaration
   (nil value and type ID_code), update it in place with the definition's
   type/value/etc. The symbol's module and base_name are deliberately left
   untouched: both participate in symbol_tablet's indices and must not be
   changed after insertion (symbol_tablet::validate() enforces this). A
   collision with anything else (a symbol that already has a definition, or
   a non-function symbol) falls through to a warning.
2. When the mangled name already has an empty function_map entry, swap in
   the definition's body via goto_functiont::swap() (which also carries
   over function_is_hidden). The entry is looked up before the emplace so
   that the definition is not left moved-from on a colliding emplace; a
   pre-existing, non-empty entry is reported at debug verbosity.
3. Rename parameter_identifiers when renaming functions (matching what
   link_goto_model already does).

The old symbols are now erased by name rather than via stored iterators,
which the preceding insert loop could invalidate by rehashing the symbol
table.

Add a unit test (unit/goto-programs/name_mangler.cpp) driving
function_name_manglert::mangle() against a hand-built goto_modelt; it
covers the forward-declaration success path (asserting symbol_table
consistency via validate()), the warning path when the mangled name
already has a definition, and a non-function symbol collision. The
existing regression test in regression/goto-cc-file-local/
multi-file-compile-together exercises the end-to-end goto-cc scenario.

Fixes: diffblue#8254

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
mangle() had grown to cover five distinct phases (discovery, symbol-table
merge, renaming of symbols, renaming of function bodies, and function-map
merge) in a single ~130-line method. Split it into named private helpers
(collect_file_local_functions, merge_into_symbol_table,
apply_rename_to_symbols, apply_rename_to_functions, merge_into_function_map)
so each phase can be read and tested in isolation.

This is a pure refactoring with no behavioural change; the existing unit
and regression tests continue to pass.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
@tautschnig tautschnig force-pushed the fix-8254-local_bitvector_analysist branch from 3083aca to c94d2a0 Compare June 10, 2026 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Invariant violation in local_bitvector_analysist when --export-file-local-symbols is used

2 participants