PALEO Project Check-in 2

Short, honest progress report focused on what is actually implemented in the repo right now.

IN PROGRESS Live capture + control prototypes are working; full Letta wiring is the main remaining gap.

Project Snapshot

Title: PALEO (with Primal Mind)
Team: World's Finest
Members: Laura Wetherhold, Alexus Aguirre Arias
Core aim: believable per-dinosaur behavior loops, not static NPC scripts.

Status at a Glance

WORKING Live screen feed, HUD/overlay, and guarded keyboard path.

NEXT Real Letta session in the middle + in-game PoT smoke tests.

Check-in Questions (1 to 9)

1-2) Project title and team

Project title: PALEO (with Primal Mind)

Team: World's Finest (Laura Wetherhold, Alexus Aguirre Arias)

3) Problem to solve

Many dinosaur-game NPCs feel stale and non-individual. PALEO focuses on agentic dinosaurs with different personality thresholds that read your screen, form thoughts, and choose actions, so behavior feels more immersive and less scripted. Stretch idea: quest autopilot. Main goal: believable animal-like behavior.

4-5) AI functions and agentic AI usage

ML pipeline: trained ResNet-18 on Snapshot Serengeti predator/non-predator images, then fine-tuned on 300 manually labeled Path of Titans screenshots to reduce domain shift.
Model-selection shift: 1e-4 was strongest for the nearly 10k-image balanced Serengeti run and still strongest by PoT validation accuracy, but the live-agent goal prioritizes predator recall.
CV: live screen capture + frame stats + HUD/overlay visibility.
Search: wiki RAG path in src/wiki_rag.py.
Letta role: planned main decision/memory layer; current repo has Letta-shaped tools/stubs.

6) Datasets

Primary: Snapshot Serengeti (Dryad CSV workflow documented in README).
Serengeti split: 6,529 training images and 2,142 validation images from the existing manifest split.
Domain adaptation set: 300 labeled Path of Titans screenshots (predator / non_predator).
PoT split: 240 training screenshots and 60 validation screenshots.
Verification holdout: 10 Path of Titans test images (filename-labeled and manual labels).
Optional later: larger Kaggle packs (deferred for size/bandwidth reasons).
Modality: image + CSV metadata now; text for RAG; gameplay frames later.
Preprocessing: prepare_data -> manifest/splits -> local JPEGs -> train/eval scripts.

7) Evaluation plan

Image metrics: accuracy, F1 (especially predator class), confusion matrix.
Safety goal: prioritize predator recall (fewer false negatives), even if false positives increase.
Thresholding: use adjustable predator probability threshold instead of fixed 0.50.
Baselines: rule/heuristic + no-training majority baseline vs trained runs.
Agent checks: loop stability, thought/action alignment, latency, and safe control behavior.

8) Current progress

Live screen feed: working via run_paleo_live.py, serve_companion.py, and overlay path.
Keyboard output: prototype control path works with --enable-control + emergency stop.
Agent loop code: Instinct Agent + Primal Mind + action mapping + scenario simulation exist.
Training path: Serengeti -> PoT fine-tune pipeline is running end-to-end with saved checkpoints and metrics.
Latest Serengeti result: best 15-epoch real-image run is ResNet-18 lr=1e-4 + augmentation with 0.9118 validation accuracy and 0.9206 predator recall on the 2,142-image validation split.
300-image PoT result: lr=1e-4 is the accuracy pick on the 60-image validation split (0.767 accuracy, 0.742 predator recall).
Agent safety result: class weighting alone did not fix default-threshold false negatives, so the live-agent operating point switched to lr=3e-5, predator class weight 3.0, and threshold 0.20, moving predator recall on the 10-image holdout from 0.571 to 0.714.
Safety controls added: inference now supports predator threshold tuning and threshold-sweep metrics for recall/precision tradeoffs.
Main gap: real Letta session integration in the middle of the live loop.
Still pending: full Path of Titans in-game smoke test (focused game window + simple walk control).

9) Next-step plan

Wire Letta as the real memory/reason/decision layer.
Run in-game tests: confirm capture sees PoT and control can perform simple movement safely.
Add report-ready experiment tables/figures from local runs.
Continue improving behavior quality and realism after Letta hookup is stable.