PALEO Project Check-in 1

Complete web version of the check-in response, with current experiment artifacts experiment artifacts and upcoming technical deliverables.

COMPLETE Core written check-in content is now published here.

Project Snapshot

  • Title: PALEO (with Primal Mind)
  • Team: World's Finest
  • Members: Laura Wetherhold, Alexus Aguirre Arias
  • Target: Dinosaur survival agent with interpretable thought/action loops.

Current Status

DONEScaffold, docs, deployment, Serengeti training, and PoT fine-tuning are ready.

IN PROGRESSLetta-in-the-loop integration and full in-game smoke testing are next.

This page now reflects the later project state, including saved model metrics and figures.

Check-in Questions (1 to 10)

1) Project title and team

Project title: PALEO (with Primal Mind)

Team: World's Finest (Laura Wetherhold, Alexus Aguirre Arias)

2-4) Abstract and problem definition

PALEO addresses a common issue in game AI: predictable and non-interpretable NPC behavior. The system uses computer vision plus behavior modeling to infer high-level state (threat, urgency, risk posture), then generates explainable thought logs and aligned actions.

  • Input: real-time game screen captures.
  • Output: thought logs, simulated controls, and updated per-dinosaur memory state.
  • Task types: machine learning, computer vision, and agentic automation.
5-6) Dataset and model choices
  • Datasets: Snapshot Serengeti for real-image training + 300 manually labeled Path of Titans screenshots for domain adaptation.
  • Preprocessing: unified predator/non-predator label schema, augmentation, balancing, and train/validation splits.
  • Baseline: OpenCV rule-based policy.
  • Main model: ResNet-18 transfer learning for behavior intent classification.
  • Agent orchestration: one Letta agent per dinosaur with memory blocks for needs/events/risk.
7) Target users and impact
  • Game developers: behavior stress-testing for map/resource balancing.
  • Players: survival-loop assistance and better accessibility support.
  • Impact: less scripted, more interpretable and lifelike agent behavior.
8) Technical plan and timeline
  • Week 1: finalize dataset selection and preprocessing scripts.
  • Week 2: implement OpenCV baseline + first ResNet-18 training run.
  • Week 3: sensitivity and hyperparameter sweep.
  • Week 4: integrate classifier into the live agent loop.
  • Week 5: produce final figures/tables, safety threshold analysis, and polish report.
9-10) Progress and challenges

Current repo status has moved past scaffold-first: src/, scripts/, tests/, docs/, saved checkpoints, saved metrics, reproducible commands in README, and Pages deployment.

  • Progress honesty: model results exist, but the Path of Titans holdout is still tiny and should be treated as an early safety check.
  • Challenges: class imbalance, domain shift, hardware limits, label noise, low-confidence oscillations, and false negatives.
  • Mitigations: weighted loss, balancing, confidence thresholds, threshold sweeps, and safe-action fallback logic.

Current Artifact Snapshot

These cards summarize the current outputs and next artifact upgrades.

Training Curves Panel

Now populated by convergence curves from the latest ResNet-18 sweep.

Sensitivity Analysis Table

Now represented with learning-rate sensitivity across augmented runs.

Baseline Comparison

Now represented with final comparison across baseline and model variants.

Qualitative Case Gallery

Placeholder for success/failure behavior examples and short analyses.

Memory Block Visualizer

Placeholder for per-dinosaur memory state view (needs, threat memory, actions).

Demo Clip Embed

Placeholder for in-engine clips showing perceive-decide-act behavior loop.

Training Curves Panel

Training and validation loss curves for experiment runs

Best selected run: ResNet-18, lr=1e-4, augmentation on, 15 epochs.

Sensitivity Analysis Table

Validation accuracy sensitivity across learning-rate choices
Run Notes
LR=1e-3 (aug) Stronger early gains, less stable at later epochs.
LR=1e-4 (aug) More stable convergence and selected as best configuration.

Baseline Comparison

Final validation accuracy comparison across baseline and ResNet variants

Baseline versus model variants from the latest sweep.

Confusion matrix for best selected ResNet-18 model: lr=1e-4 with augmentation at 15 epochs

Planning Notes