← Back to Blog

Basketball Learning Robot

Basketball Learning Robot

I've been tinkering with a small but ambitious project for the last 26 hours straight... a basketball-playing robot simulation in Rust. It's equal parts robotics, reinforcement learning, and real-time graphics, and it's been a great excuse to stress-test my tooling choices and my understanding of modern RL and robotics systems for that matter. I tried to use a fair bit of openai codex to generate some of the joints and do the math... it didn't really work as expected but it definitely helped me get some of the components in place. The choice of using zenoh for example was inspired by copper-rs so the training did not block the rendering loop, i tried some many iterations for llm generated options that failed, so at the end i had to do it by hand.

You can visit the project here https://robot.zeyaddeeb.com and/or see the code on github. To be honest, i don't know how long it is gonna take to actually drop a ball in the hoop, so check later if it is still fumbling.

Tooling Choices

Why Bevy for the simulation?

Bevy is a modern game engine with an ECS (Entity-Component-System) architecture. That matters because in robotics, the world is a lot of entities: joints, limbs, collisions, sensors, and state. ECS makes it natural to:

  • Separate physics, control, and rendering.

  • Run systems deterministically.

  • Inspect state when training goes sideways.

I'm using Bevy not just for visuals but as a real simulation runtime. Being able to see the robot move, fail, and recover in real time makes debugging and intuition-building much easier.

Why Candle for RL part?

Candle is a Rust native ML framework that feels lightweight and composable. I chose it for a few reasons:

  • No Python bridge in the critical path. I can keep the training loop and the simulation in the same language.

  • Tight control over model structure and data flow.

  • Simple deployment... I can bundle everything into a single container and run it.

Candle isn't as feature rich as PyTorch, but for a project like this (SAC/DDPG style actors and critics), it's more than enough. And the iteration speed is great when you're experimenting.

How training works (high level)

The robot runs in a physics simulation. Each step:

  1. It observes the world state.

  2. The policy produces an action (joint torques).

  3. The environment updates and returns a reward.

  4. The transition is stored and used for learning.

I built two paths: a native training loop for speed and a WASM path so the simulation can run in the browser while a separate WebSocket server provides actions.

RL algorithms (DDPG + SAC)

I’m using two off‑policy actor‑critic algorithms: DDPG (Deterministic Policy Gradient) and SAC (Soft Actor‑Critic). Both are a good fit for continuous control because they learn a policy (the actor) and a value estimate (the critic) from replayed experience.

  • DDPG gives me a deterministic policy that’s simple and fast to evaluate. It’s great for baseline stability once the reward shaping is sane, but it can be brittle if exploration or rewards are off.

  • SAC adds an entropy term that rewards exploration. In practice it’s more stable across different reward scales and keeps training moving even when the policy starts to get stuck.

In my setup, both algorithms share the same simulation and observation pipeline. I can flip between them depending on whether I want faster convergence (DDPG) or more robust exploration (SAC). It’s been useful to compare them side‑by‑side because they fail in different ways, and that tells me a lot about the environment design.

WebAssembly + WebSocket bridge

To make the demo accessible, I compile the simulation to WebAssembly so it can run in the browser. The UI and rendering stay local, but the policy inference runs in a WebSocket server. That gives me a clean split:

  • The WASM app handles physics, visuals, and observation extraction.

  • The WS server handles actions and training, so I can update models without rebuilding the frontend.

This setup keeps the browser experience smooth while letting the training loop evolve independently, a practical middle ground between “all-in-browser” and full native simulation.

If you're curious about the code or want to compare notes, feel free to reach out. I learned quite a bit and still learning... so no hate mail on why a choice was made and not the other!