Why
The API researcher agent (LLM) sometimes generates duplicate or near-duplicate version entries in api_research_report.json. For example, when researching an API that only has v1 (like Google Contacts/People API), the LLM might create two entries — one for "v1" and another for "REST" or a renamed variant — both pointing to the same base path. This results in the system incorrectly reporting "primary_version=v1, 2 versions" when only 1 version exists.
This causes wasted crawling effort and potentially confusing deduplication behavior downstream.
What
- Prompt fix (
prompt.py): Added explicit guidance to not invent/hallucinate versions, clarified that different API products (e.g., People API vs Contacts API) are not different "versions", and emphasized listing only documented versions
- Post-processing validation (
agent.py): Added _deduplicate_versions() static method to APIResearcherRunner that merges entries with the same normalized version identifier or base_path. Keeps the first (highest-priority) entry on conflict.
- Report writeback: After deduplication, writes the cleaned report back to disk so the coordinator reads correct data
- Tests: Added 9 unit tests covering edge cases (empty, single, distinct versions, duplicate IDs, duplicate base paths, case insensitivity, trailing slashes)
How to Test
cd /workspace/integrator
source .venv/bin/activate
python -m pytest cortex/tests/test_agents/test_action_finder/test_api_researcher.py -v -o "addopts="
All 9 tests pass. Existing 383 action_finder tests also pass unchanged.
Pre-Review Checklist
Notes
- The dedup logic normalizes version identifiers (lowercase, strip whitespace) and base paths (lowercase, strip trailing slashes) before comparing
- First entry wins on conflict — this preserves the LLM's ordering (typically highest priority first)
- The prompt change alone may not fully prevent the issue since LLM behavior is non-deterministic; the post-processing validation provides a deterministic safety net
Triggered by: pranai@usefulagents.com | Source: slack
Session: https://zen-api-production-4c98.up.railway.app/dashboard/#/chat/zen-7aa25e50348b