Technical Overview

PALEO Technical Dashboard

Model, data, and integration details for development-focused readers.

IN PROGRESSTraining, fine-tuning, safety thresholding, and live runtime paths are working; Letta-in-the-loop remains the main integration gap.

Current Milestone

Check-in 2

Progress update published with live runtime and command guidance.

Primary Approach

CV + Agentic AI

ResNet-18 behavior classification + Letta memory-driven agent loop.

Repository State

Models + Pages

src/, scripts/, tests/, docs/, saved checkpoints, metrics, and Pages site deployed.

Planned Agent Scale

1 Agent / Dinosaur

Each dinosaur keeps identity, needs, risks, and recent action memory blocks.

Technical Navigation

Dashboard Navigation

Project Check-in 1

Current write-up with active experiment outputs and visuals.

Project Check-in 2

Current progress update with live pipeline status and next steps.

Model Results Dashboard

Current curves and comparison charts; next update adds expanded metric tables.

Agent Demo Gallery

Next planned: clips with thought logs and behavior snapshots.

Architecture Viewer

Next planned: perceive - decide - act - remember flow.

Data and Models

Technical Direction

Training path: Snapshot Serengeti pretraining on a nearly 10k-image balanced real-image set, then Path of Titans fine-tuning on 300 labeled screenshots.
Serengeti split: 6,529 training images and 2,142 validation images from the existing manifest split.
PoT split: 240 training screenshots and 60 validation screenshots from an 80/20 split of the 300 labeled screenshots.
Baseline: OpenCV rule policy for first benchmark.
Main model: ResNet-18 transfer learning for predator vs non-predator prediction.
Accuracy pick: lr=1e-4 stayed strongest after the 300-image PoT fine-tune, with validation accuracy 0.767 on the 60-image split.
Safety pick: lr=3e-5 + predator class weight 3.0 + threshold 0.20.
Why we shifted: class weighting alone did not fix the false-negative problem at the default threshold, and gameplay should prefer false positives over missed predators.
Holdout result (10 PoT images): the safety operating point moved predator recall from 0.571 to 0.714, with accuracy 0.70.
Agent orchestration: one Letta agent per dinosaur.
Delivered outputs: convergence curves, sensitivity sweeps, confusion matrices, threshold sweeps, and qualitative failure cases.

Reproducibility

Reproducible Commands

python scripts/run_pipeline.py
python -m unittest discover -s tests -p "test_*.py"

Core smoke-test and verification commands are listed here; full experiment/report commands remain in README.

Run Now

In-Game and HUD Commands

Recommended in-game overlay (advice mode):
py -3 scripts/run_paleo_overlay.py --mode advice --classifier-checkpoint results/experiments/serengeti_disk_resnet18/resnet18_serengeti_disk.pt
Guarded control smoke test (overlay):
py -3 scripts/run_paleo_overlay.py --mode control --enable-control --classifier-checkpoint results/experiments/serengeti_disk_resnet18/resnet18_serengeti_disk.pt
PALEO.exe-style browser HUD runtime:
py -3 scripts/run_paleo_live.py --classifier-checkpoint results/experiments/serengeti_disk_resnet18/resnet18_serengeti_disk.pt

Use overlay for active in-game runtime; keep F12 emergency stop ready during control tests.

Delivery Tracker

Planned Features

DONEGitHub Pages deployment from pages/ via GitHub Actions.

DONEPublic homepage + technical + check-in tab structure.

DONEDocumentation refinement and technical page upgrades.

DONEDataset ingestion pipeline for selected Serengeti and Path of Titans training subsets.

DONEClassifier training with sensitivity experiments, baseline comparison, and confusion matrices.

DONESafety threshold sweep for predator recall on the 10-image Path of Titans holdout.

IN PROGRESSIntegrated agent loop with confidence thresholds and fallback actions.

PLANNEDFull in-game Path of Titans smoke test with focused game window and simple movement.

Timeline

Roadmap Snapshot

Week 1: finalize dataset selection and preprocessing scripts.
Week 2: implement OpenCV baseline and first ResNet-18 run.
Week 3: run augmentation and hyperparameter sensitivity.
Week 4: integrate classifier with Letta agent loop.
Week 5: finalize report visuals and project outputs.

Technical Visuals and Model Evidence

Placeholder: architecture diagram (perceive -> decide -> act -> remember)

Placeholder: dinosaur thought/action timeline

Latest Serengeti validation accuracy by experiment.

Latest Serengeti Accuracy by Experiment

Latest Serengeti confusion matrices across all evaluated experiments.

All Serengeti Experiment Confusion Matrices

Path of Titans 300-screenshot validation confusion matrix comparison.

300-Screenshot Validation Split (60 images)

Baseline vs safety-tuned confusion matrices on 10 Path of Titans holdout images.

Agent Safety Operating Point (10-image holdout)