Technical Overview

PALEO Technical Dashboard

Model, data, and integration details for development-focused readers.

IN PROGRESSTraining, fine-tuning, safety thresholding, and live runtime paths are working; Letta-in-the-loop remains the main integration gap.

Current Milestone

Check-in 2

Progress update published with live runtime and command guidance.

Primary Approach

CV + Agentic AI

ResNet-18 behavior classification + Letta memory-driven agent loop.

Repository State

Models + Pages

src/, scripts/, tests/, docs/, saved checkpoints, metrics, and Pages site deployed.

Planned Agent Scale

1 Agent / Dinosaur

Each dinosaur keeps identity, needs, risks, and recent action memory blocks.

Technical Navigation

Dashboard Navigation

Data and Models

Technical Direction

  • Training path: Snapshot Serengeti pretraining on a nearly 10k-image balanced real-image set, then Path of Titans fine-tuning on 300 labeled screenshots.
  • Serengeti split: 6,529 training images and 2,142 validation images from the existing manifest split.
  • PoT split: 240 training screenshots and 60 validation screenshots from an 80/20 split of the 300 labeled screenshots.
  • Baseline: OpenCV rule policy for first benchmark.
  • Main model: ResNet-18 transfer learning for predator vs non-predator prediction.
  • Accuracy pick: lr=1e-4 stayed strongest after the 300-image PoT fine-tune, with validation accuracy 0.767 on the 60-image split.
  • Safety pick: lr=3e-5 + predator class weight 3.0 + threshold 0.20.
  • Why we shifted: class weighting alone did not fix the false-negative problem at the default threshold, and gameplay should prefer false positives over missed predators.
  • Holdout result (10 PoT images): the safety operating point moved predator recall from 0.571 to 0.714, with accuracy 0.70.
  • Agent orchestration: one Letta agent per dinosaur.
  • Delivered outputs: convergence curves, sensitivity sweeps, confusion matrices, threshold sweeps, and qualitative failure cases.
Reproducibility

Reproducible Commands

  • python scripts/run_pipeline.py
  • python -m unittest discover -s tests -p "test_*.py"

Core smoke-test and verification commands are listed here; full experiment/report commands remain in README.

Run Now

In-Game and HUD Commands

Use overlay for active in-game runtime; keep F12 emergency stop ready during control tests.

Delivery Tracker

Planned Features

DONEGitHub Pages deployment from pages/ via GitHub Actions.
DONEPublic homepage + technical + check-in tab structure.
DONEDocumentation refinement and technical page upgrades.
DONEDataset ingestion pipeline for selected Serengeti and Path of Titans training subsets.
DONEClassifier training with sensitivity experiments, baseline comparison, and confusion matrices.
DONESafety threshold sweep for predator recall on the 10-image Path of Titans holdout.
IN PROGRESSIntegrated agent loop with confidence thresholds and fallback actions.
PLANNEDFull in-game Path of Titans smoke test with focused game window and simple movement.
Timeline

Roadmap Snapshot

Technical Visuals and Model Evidence
Placeholder: architecture diagram (perceive -> decide -> act -> remember)
Placeholder: dinosaur thought/action timeline
Latest Serengeti validation accuracy by experiment.
Latest Serengeti Accuracy by Experiment
Latest Serengeti confusion matrices across all evaluated experiments.
All Serengeti Experiment Confusion Matrices
Path of Titans 300-screenshot validation confusion matrix comparison.
300-Screenshot Validation Split (60 images)
Baseline vs safety-tuned confusion matrices on 10 Path of Titans holdout images.
Agent Safety Operating Point (10-image holdout)