PALEO Project Check-in 2

Short, honest progress report focused on what is actually implemented in the repo right now.

IN PROGRESS Live capture + control prototypes are working; full Letta wiring is the main remaining gap.

Project Snapshot

  • Title: PALEO (with Primal Mind)
  • Team: World's Finest
  • Members: Laura Wetherhold, Alexus Aguirre Arias
  • Core aim: believable per-dinosaur behavior loops, not static NPC scripts.

Status at a Glance

WORKING Live screen feed, HUD/overlay, and guarded keyboard path.

NEXT Real Letta session in the middle + in-game PoT smoke tests.

Check-in Questions (1 to 9)

1-2) Project title and team

Project title: PALEO (with Primal Mind)

Team: World's Finest (Laura Wetherhold, Alexus Aguirre Arias)

3) Problem to solve

Many dinosaur-game NPCs feel stale and non-individual. PALEO focuses on agentic dinosaurs with different personality thresholds that read your screen, form thoughts, and choose actions, so behavior feels more immersive and less scripted. Stretch idea: quest autopilot. Main goal: believable animal-like behavior.

4-5) AI functions and agentic AI usage
  • ML pipeline: trained ResNet-18 on Snapshot Serengeti predator/non-predator images, then fine-tuned on 300 manually labeled Path of Titans screenshots to reduce domain shift.
  • Model-selection shift: 1e-4 was strongest for the nearly 10k-image balanced Serengeti run and still strongest by PoT validation accuracy, but the live-agent goal prioritizes predator recall.
  • CV: live screen capture + frame stats + HUD/overlay visibility.
  • Search: wiki RAG path in src/wiki_rag.py.
  • Letta role: planned main decision/memory layer; current repo has Letta-shaped tools/stubs.
6) Datasets
  • Primary: Snapshot Serengeti (Dryad CSV workflow documented in README).
  • Serengeti split: 6,529 training images and 2,142 validation images from the existing manifest split.
  • Domain adaptation set: 300 labeled Path of Titans screenshots (predator / non_predator).
  • PoT split: 240 training screenshots and 60 validation screenshots.
  • Verification holdout: 10 Path of Titans test images (filename-labeled and manual labels).
  • Optional later: larger Kaggle packs (deferred for size/bandwidth reasons).
  • Modality: image + CSV metadata now; text for RAG; gameplay frames later.
  • Preprocessing: prepare_data -> manifest/splits -> local JPEGs -> train/eval scripts.
7) Evaluation plan
  • Image metrics: accuracy, F1 (especially predator class), confusion matrix.
  • Safety goal: prioritize predator recall (fewer false negatives), even if false positives increase.
  • Thresholding: use adjustable predator probability threshold instead of fixed 0.50.
  • Baselines: rule/heuristic + no-training majority baseline vs trained runs.
  • Agent checks: loop stability, thought/action alignment, latency, and safe control behavior.
8) Current progress
  • Live screen feed: working via run_paleo_live.py, serve_companion.py, and overlay path.
  • Keyboard output: prototype control path works with --enable-control + emergency stop.
  • Agent loop code: Instinct Agent + Primal Mind + action mapping + scenario simulation exist.
  • Training path: Serengeti -> PoT fine-tune pipeline is running end-to-end with saved checkpoints and metrics.
  • Latest Serengeti result: best 15-epoch real-image run is ResNet-18 lr=1e-4 + augmentation with 0.9118 validation accuracy and 0.9206 predator recall on the 2,142-image validation split.
  • 300-image PoT result: lr=1e-4 is the accuracy pick on the 60-image validation split (0.767 accuracy, 0.742 predator recall).
  • Agent safety result: class weighting alone did not fix default-threshold false negatives, so the live-agent operating point switched to lr=3e-5, predator class weight 3.0, and threshold 0.20, moving predator recall on the 10-image holdout from 0.571 to 0.714.
  • Safety controls added: inference now supports predator threshold tuning and threshold-sweep metrics for recall/precision tradeoffs.
  • Main gap: real Letta session integration in the middle of the live loop.
  • Still pending: full Path of Titans in-game smoke test (focused game window + simple walk control).
9) Next-step plan
  • Wire Letta as the real memory/reason/decision layer.
  • Run in-game tests: confirm capture sees PoT and control can perform simple movement safely.
  • Add report-ready experiment tables/figures from local runs.
  • Continue improving behavior quality and realism after Letta hookup is stable.