Current curves and comparison charts; next update adds expanded metric tables.
Agent Demo Gallery
Next planned: clips with thought logs and behavior snapshots.
Architecture Viewer
Next planned: perceive - decide - act - remember flow.
Data and Models
Technical Direction
Training path: Snapshot Serengeti pretraining on a nearly 10k-image balanced real-image set, then Path of Titans fine-tuning on 300 labeled screenshots.
Serengeti split: 6,529 training images and 2,142 validation images from the existing manifest split.
PoT split: 240 training screenshots and 60 validation screenshots from an 80/20 split of the 300 labeled screenshots.
Baseline: OpenCV rule policy for first benchmark.
Main model: ResNet-18 transfer learning for predator vs non-predator prediction.
Accuracy pick: lr=1e-4 stayed strongest after the 300-image PoT fine-tune, with validation accuracy 0.767 on the 60-image split.
Why we shifted: class weighting alone did not fix the false-negative problem at the default threshold, and gameplay should prefer false positives over missed predators.
Holdout result (10 PoT images): the safety operating point moved predator recall from 0.571 to 0.714, with accuracy 0.70.
Agent orchestration: one Letta agent per dinosaur.
Guarded control smoke test (overlay): py -3 scripts/run_paleo_overlay.py --mode control --enable-control --classifier-checkpoint results/experiments/serengeti_disk_resnet18/resnet18_serengeti_disk.pt
Use overlay for active in-game runtime; keep F12 emergency stop ready during control tests.
Delivery Tracker
Planned Features
DONEGitHub Pages deployment from pages/ via GitHub Actions.DONEPublic homepage + technical + check-in tab structure.DONEDocumentation refinement and technical page upgrades.DONEDataset ingestion pipeline for selected Serengeti and Path of Titans training subsets.DONEClassifier training with sensitivity experiments, baseline comparison, and confusion matrices.DONESafety threshold sweep for predator recall on the 10-image Path of Titans holdout.IN PROGRESSIntegrated agent loop with confidence thresholds and fallback actions.PLANNEDFull in-game Path of Titans smoke test with focused game window and simple movement.
Timeline
Roadmap Snapshot
Week 1: finalize dataset selection and preprocessing scripts.
Week 2: implement OpenCV baseline and first ResNet-18 run.
Week 3: run augmentation and hyperparameter sensitivity.
Week 4: integrate classifier with Letta agent loop.
Week 5: finalize report visuals and project outputs.