Skip to content

[refactor] Semantic Function Clustering Analysis: Outliers and Duplicate Detection #18187

@github-actions

Description

@github-actions

Overview

Semantic function clustering analysis of 559 non-test Go files across 18 packages in pkg/. Functions were inventoried by file, clustered by naming patterns and purpose, and cross-referenced to detect misplaced functions (outliers) and near-duplicate implementations.

Stats: ~7,500 functions cataloged across pkg/workflow (278 files), pkg/cli (190 files), pkg/parser (37 files), pkg/console (27 files), and 14 utility packages.


Critical Issues

Finding 1 — Duplicate deduplication logic in pkg/parser

File pkg/parser/schema_utilities.go
Function removeDuplicates(strings []string) []string (line ~37)
Duplicate of pkg/sliceutil/sliceutil.go:Deduplicate[T comparable]

The removeDuplicates function in the parser package is a private reimplementation of the already-available generic sliceutil.Deduplicate. The logic is nearly identical:

// pkg/parser/schema_utilities.go — private reimplementation
func removeDuplicates(strings []string) []string {
    seen := make(map[string]bool)
    var result []string
    for _, str := range strings {
        if !seen[str] {
            seen[str] = true
            result = append(result, str)
        }
    }
    return result
}

// pkg/sliceutil/sliceutil.go — canonical generic version
func Deduplicate[T comparable](slice []T) []T {
    seen := make(map[T]bool, len(slice))
    result := make([]T, 0, len(slice))
    for _, item := range slice {
        if !seen[item] {
            seen[item] = true
            result = append(result, item)
        }
    }
    return result
}

Recommendation: Replace removeDuplicates(x) call sites with sliceutil.Deduplicate(x) and delete the private function.


Outlier Functions (Functions in Wrong Files)

Finding 2 — YAML rendering method in redact_secrets.go

File pkg/workflow/redact_secrets.go
Function (*Compiler).renderStepFromMap(yaml, step, data, indent) (line 132)
Expected location pkg/workflow/compiler_yaml_helpers.go

renderStepFromMap renders a GitHub Actions YAML step from a map[string]any. It has no semantic relationship to secret redaction — it is a general YAML serialization helper. It is only called once, from generateCustomSecretMaskingStep in the same file, suggesting it was placed here as a convenience without considering file responsibility.

compiler_yaml_helpers.go already houses similar YAML construction helpers (e.g., ContainsCheckout). Moving renderStepFromMap there would collocate all step-rendering logic.

Recommendation: Move renderStepFromMap to pkg/workflow/compiler_yaml_helpers.go. Update the call site to reference it from there (same package — no import changes needed).


Finding 3 — Generic list formatter in github_tool_to_toolset.go

File pkg/workflow/github_tool_to_toolset.go
Function formatList(items []string) string (line 132)
Expected location pkg/workflow/validation_helpers.go or pkg/workflow/error_helpers.go

github_tool_to_toolset.go is responsible for validating GitHub tool names against permitted toolsets. It contains ValidateGitHubToolsAgainstToolsets and an init() that builds a lookup map — clearly a focused validation file.

formatList is a generic utility that joins items into a quoted, comma-separated string for error messages. It has no toolset-specific logic and could be useful across many validation error messages in the package.

Recommendation: Move formatList to pkg/workflow/validation_helpers.go (or error_helpers.go). The same-package move requires no import changes.


Finding 4 — Validation function in config_helpers.go

File pkg/workflow/config_helpers.go
Function validateTargetRepoSlug(targetRepoSlug string, log *logger.Logger) bool
Expected location pkg/workflow/validation_helpers.go

config_helpers.go documents itself as a file for parsing config values from map[string]any frontmatter. validateTargetRepoSlug performs domain validation (it rejects the wildcard "*" as an invalid repo slug) — this is a validation concern, not a parsing concern.

validation_helpers.go already houses validateIntRange, ValidateRequired, ValidateMaxLength, ValidateInList, etc. — exactly where this function belongs.

Recommendation: Move validateTargetRepoSlug to pkg/workflow/validation_helpers.go and update the single call site in config_helpers.go.


Mixed-Concern Files

Finding 5 — Utility and rendering functions in console_types.go

File pkg/console/console_types.go
Concern Non-type functions mixed with struct/type definitions

console_types.go defines core types (CompilerError, TableConfig, TreeNode, SelectOption, FormField, ListItem) but also hosts five functions that belong in other files:

Function Correct file
ToRelativePath(path string) string console.go or new console_paths.go
RenderTableAsJSON(config TableConfig) (string, error) render.go
FormatErrorWithSuggestions(message string, suggestions []string) string console.go (alongside other Format* functions)
renderTreeSimple(node TreeNode, prefix string, isLast bool) string render.go
findWordEnd(line string, start int) int render.go (helper for tree rendering)

The render.go file already exists in the package and houses RenderStruct and related helpers. Keeping types in console_types.go and moving functions to their semantic homes improves discoverability.

Recommendation: Move the five functions to console.go / render.go as indicated. No external API changes — all are unexported except ToRelativePath, RenderTableAsJSON, and FormatErrorWithSuggestions.


Finding 6 — Parsing and building concerns mixed in safe_output_builder.go

File pkg/workflow/safe_output_builder.go
Concern Parse*Config functions (parsing) mixed with Build*EnvVar functions (code generation)

safe_output_builder.go contains two distinct responsibility clusters:

  • Parsing cluster: ParseTargetConfig, ParseFilterConfig, ParseDiscussionFilterConfig, ParseCloseJobConfig, ParseListJobConfig — these parse raw map[string]any frontmatter into typed config structs.
  • Building cluster: BuildTargetEnvVar, BuildRequiredLabelsEnvVar, BuildRequiredTitlePrefixEnvVar, BuildMaxCountEnvVar, BuildAllowedListEnvVar, BuildCloseJobEnvVars, BuildListJobEnvVars — these generate YAML environment variable declarations from configs.

The file is 369 lines. Separating these into safe_output_parser.go and keeping safe_output_builder.go for the build functions would mirror the pattern already established by safe_inputs_parser.go, safe_inputs_renderer.go, and safe_inputs_generator.go.

Recommendation: Extract Parse* functions into pkg/workflow/safe_output_parser.go. Retain Build* functions in safe_output_builder.go.


Function Cluster Summary

Well-organised clusters (no action needed)

These clusters are already well-structured and serve as positive examples:

  • compiler_safe_outputs_*.go (8 files): Clear decomposition by concern — _config, _core, _discussions, _env, _job, _shared, _specialized, _steps
  • mcp_config_*.go (6 files): Clean separation of _builtin, _custom, _playwright_renderer, _serena_renderer, _types, _utils, _validation
  • compile_*.go in pkg/cli (10 files): Good decomposition of compilation pipeline — _command, _helpers, _config, _orchestration, _orchestrator, _output_formatter, _post_processing, _stats, _validation, _workflow_processor
  • logs_*.go in pkg/cli (9 files): Clear separation of log concerns — _cache, _command, _display, _download, _github_api, _metrics, _models, _orchestrator, _parsing_*, _report, _utils
  • mcp_inspect_*.go in pkg/cli (7 files): Fine-grained inspection subcommand decomposition
  • Separate engine files: claude_engine.go, codex_engine.go, copilot_engine.go, gemini_engine.go — each engine is self-contained
Semantic clustering results (parse functions)

Parse functions (parse* / Parse*) are intentionally spread across 25 files in pkg/workflow. Analysis shows this is appropriate: each parser is co-located with its domain type (e.g., trigger_parser.go, slash_command_parser.go, tools_parser.go, safe_inputs_parser.go). The only outlier is parseRequiredLabelsFromConfig and parseRequiredTitlePrefixFromConfig in safe_output_builder.go, which are config-parsing helpers in a code-generation file (addressed in Finding 6 above).


Refactoring Recommendations

Priority 1 — High Impact

  1. Consolidate removeDuplicatessliceutil.Deduplicate
    • pkg/parser/schema_utilities.go → replace with sliceutil.Deduplicate(x)
    • Adds a new import github.com/github/gh-aw/pkg/sliceutil to the parser package

Priority 2 — Medium Impact

  1. Move renderStepFromMap to compiler_yaml_helpers.go

    • Improves cohesion of redact_secrets.go; all YAML step helpers in one place
  2. Reorganise console_types.go — move 5 non-type functions to console.go / render.go

Priority 3 — Low Impact

  1. Move formatList to validation_helpers.go — makes the formatter reusable across other error messages
  2. Move validateTargetRepoSlug to validation_helpers.go — consistent with existing validation helper pattern
  3. Extract safe_output_parser.go from safe_output_builder.go — mirrors safe_inputs_parser.go pattern

Implementation Checklist

  • Replace parser.removeDuplicates with sliceutil.Deduplicate[string]; delete the private function
  • Move (*Compiler).renderStepFromMap from redact_secrets.go to compiler_yaml_helpers.go
  • Move formatList from github_tool_to_toolset.go to validation_helpers.go
  • Move validateTargetRepoSlug from config_helpers.go to validation_helpers.go
  • Move non-type functions out of console_types.go into console.go / render.go
  • Extract Parse*Config functions from safe_output_builder.go into new safe_output_parser.go
  • Run make test-unit and make lint after each change

Analysis Metadata

Metric Value
Total Go files analyzed 559 (non-test)
Total packages scanned 18
Functions cataloged (approx.) ~7,500
Clusters identified 6 named clusters (compiler, mcp, logs, engine, compile, import)
Outliers found 4 functions in wrong files
Duplicate implementations detected 1 confirmed duplicate
Mixed-concern files 2 (safe_output_builder.go, console_types.go)
Detection method Naming-pattern clustering + Serena semantic analysis + manual inspection
Analysis date 2026-02-24

References:

Generated by Semantic Function Refactoring

  • expires on Feb 26, 2026, 5:31 PM UTC

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions