Teleoperation latency-jitter report

FR3 (leader) → FR20 (follower) ServoJ teleop loop · Fairino SimMachine pair · measured end-to-end with ROBOT_TRACE=1

TL;DR

The lock-contention hypothesis is not supported. The shared SDK lock (_sdk_lock) on the follower's command-apply path is acquired in ~1 µs (p50 0.001 ms, max 0.01 ms over 869 ServoJ writes). It is essentially uncontended.
The dominant jitter source is callback scheduling on the leader→convert side — the hop sysstate.pre_callback → orch.cb_enter: median 1.2 ms but p99 6.5 ms, max 9.2 ms, contributing the largest share of the end-to-end PDV (jitter). This is Python thread / GIL dispatch latency, not I/O.
Two secondary contributors, comparable in size: the ServoJ XML-RPC round-trip (p99 4.4 ms) and per-command safety validation (validate_command, p99 3.6 ms).
Side finding: only ~29% of teleop setpoints (869 of 3021) actually issued a ServoJ; the other ~71% were rejected by safety.validate_command before reaching hardware — a likely source of motion stutter worth a separate look.
This jitter has a real consequence: on the slow 2-core CI runner it inflates the follower's measured TCP-speed estimate (Δpos/Δt) enough to falsely trip the eef_speed guard and e-stop, even at a gentle commanded velocity. The jitter is not just a smoothness issue — it makes the safety gate flaky.

1 · How this was measured

The full stack was booted headless (no frontend) against two freshly-recreated Fairino SimMachine containers via scripts/profile_teleop_jitter.py, which sets ROBOT_TRACE=1, enters teleoperation, and drives the FR3 leader through 120 back-to-back circle laps over 60 s. The follower mirrors the leader through the normal ServoJ command path.

Every command carries a RobotState that is stamped with time.perf_counter() at each pipeline hop (robot_types.py:stamp, a no-op unless ROBOT_TRACE=1). Stamps are flushed to traces/robot_trace_*.jsonl and reshaped into per-hop deltas by servo7/utils/trace_stats.py. 3021 teleop commands were captured. To isolate the SDK lock specifically, an extra command_pre_lock stamp was added immediately before with self._sdk_lock: and command_lock_acquired immediately after, so lock-wait is separated from validation and from the RPC itself.

Environment: simulated controllers (CNDE realtime stream at 10 ms, publish loop 50 Hz). Absolute numbers will differ on real arms and on faster/slower hosts, but the relative breakdown and the contention conclusions hold. One 46.8 ms startup outlier (first command) is excluded from percentiles below.

2 · The command pipeline

A single teleop command travels leader→follower through these stamped hops (file:line of each stamp):

leader robot node robot_node.py:268 publish.pre_read └ read CNDE state robot_node.py:269 publish.got_state system-state manager system_state_node.py:40 ssm.deserialized system_state.py:69 sysstate.pre_callback ──┐ orchestrator teleop cb orchestrator_node.py:362 orch.cb_enter ◄─┘ ← scheduling gap └ leader→follower map orchestrator_node.py:368 orch.cb_converted └ publish to follower orchestrator_node.py:374 orch.cb_pre_serialize follower robot node robot_node.py:334 follower.command_parsed robot_node.py:339 follower.command_pre_hardware_write └ safety validate base.py:294 (validate_command) └ pre-lock fairino.py follower.command_pre_lock └ acquire _sdk_lock fairino.py:842 follower.command_lock_acquired └ ServoJ RPC fairino.py:843 follower.command_applied

3 · End-to-end result

2.43 msmedian (p50) latency

11.0 msp99 latency

±2.49 msstd dev (the jitter)

End-to-end leader-read → follower-ServoJ latency, n=3021: mean 3.20, p50 2.43, p90 7.09, p99 10.98, max 46.8 ms. The distribution is heavily right-skewed — most commands are fast (~1–2 ms) but a long tail reaches 7–11 ms. That spread is the jitter the operator feels.

1261

509

422

255

299

213

0–1

1–2

2–3

3–4

4–5

5–7.5

7.5–10

10–15

end-to-end latency (ms) — command count per bucket

4 · Where the time (and jitter) goes — per hop

Per-hop latency across all 3021 commands. The two highlighted rows are the jitter drivers; the green row is the SDK lock (the hypothesised culprit).

hop	p50	p95	p99	max	PDV p99	contribution
leader CNDE read	0.07	0.85	1.93	45.3*	2.52
state deserialize	0.10	0.38	1.61	2.20	1.37
state-mgr → callback handoff	0.02	0.10	0.14	0.20	0.07
callback scheduling pre_callback→cb_enter	1.22	5.35	6.45	9.24	3.24
leader→follower convert	0.06	1.53	2.69	6.94	2.80
serialize + bus publish	0.04	0.20	0.24	0.50	0.17
safety validate + servo-mode pre_hw→pre_lock	0.59	2.16	3.62	8.26	3.21
_sdk_lock wait pre_lock→lock_acquired	0.001	0.001	0.004	0.01	~0	~0 µs
ServoJ RPC lock_acquired→applied	1.06	3.52	4.43	8.78	3.82

Apply-path rows (validate / lock / ServoJ) are over the 869 commands that actually issued a ServoJ. *the 45.3 ms CNDE-read max is the single startup outlier. Bars show p99 relative to the largest hop.

5 · Verdict on the lock-contention hypothesis

NOT CONTENDED The follower's _sdk_lock is taken between the command_pre_lock and command_lock_acquired stamps in p50 0.001 ms / p99 0.004 ms / max 0.01 ms across 869 ServoJ writes. There is effectively no queue.

Why the original intuition was reasonable but wrong: the apply hop (command_pre_hardware_write → command_applied) is visibly variable (p99 ≈ 5 ms), so it looks like the apply method stalls. Splitting it shows the variance lives in (a) safety.validate_command (numpy bound-checks, base.py:294) and (b) the ServoJ XML-RPC round-trip (fairino.py:843) — never in lock acquisition. On the follower, the only other _sdk_lock taker on the hot path would be the 50 Hz annotate() pass, but on a follower that path reads cached/packet data and does not take the lock (_poll_controller_drag early-returns; wrench/guidance read the CNDE packet). So there is simply nothing to contend with.

6 · Root causes, ranked

① Callback scheduling latency (primary) — p99 6.5 ms, PDV max 8.3 ms

The gap between sysstate.pre_callback (system_state.py:69, just before the state-change callbacks fire) and orch.cb_enter (orchestrator_node.py:362, the teleop conversion callback actually starting). The callback is invoked directly (system_state.py:71-73), so this gap is not code work — it is the time the leader's update waits for the Python interpreter to schedule the work, i.e. GIL hand-off / thread contention between the leader publish thread, the follower publish thread, the orchestrator, and the command-apply path all competing in one process. It is both the largest median hop and the largest jitter contributor.

② ServoJ XML-RPC round-trip (secondary) — p99 4.4 ms

The blocking interface.ServoJ(...) call (fairino.py:843) to the controller. Median ~1 ms but tails to ~9 ms; this is genuine I/O variance to the (simulated) controller and is expected to differ on real hardware.

③ Per-command safety validation (secondary) — p99 3.6 ms

safety.validate_command runs on every setpoint inside base.set_state (base.py:294) before the write. Its numpy bound/speed checks add ~0.6 ms typical, ~8 ms worst-case, and run on the same contended thread.

④ Leader→follower conversion spikes — p99 2.7 ms, max 6.9 ms

cb_enter → cb_converted (orchestrator_node.py:362→368): usually ~60 µs, but occasional multi-ms spikes, again consistent with GIL pauses rather than steady cost.

7 · Side finding — 71% of setpoints never reach the arm

Of 3021 teleop commands, only 869 (29%) issued a ServoJ; the remaining 2152 (71%) returned from set_state in ~8 µs without a hardware write. That path is taken when safety.validate_command returns not ok (base.py:294-296) — the setpoint is dropped, no e-stop. Dropping two of every three frames will read as motion stutter on the follower independent of the latency numbers above. The drop reason (speed/limit/workarea clamp, or duplicate frames) was not isolated here and is the most valuable next investigation. (This is measured behaviour, not yet a confirmed defect.)

8 · Consequence — the jitter makes the CI safety gate flaky

The follower's eef_speed / joint_space_speed guards estimate speed as Δposition/Δtime from the incoming state stream. When a state frame is delayed by the scheduling jitter above, the next frame lands a large Δposition over a small Δt, so the estimated speed spikes far above the real commanded speed and trips an e-stop on a single over-threshold sample (1/1). The identical runs on the faster local host completed 120 laps with no trip.

The CI evidence is unambiguous — and proves the trip is jitter, not motion:

commanded vel	CI result	guard tripped	reported speed
70	trip	joint_space_speed	joint 1 = 114.9 °/s > 70
40	trip	eef_speed	1.099 m/s > 1.0
40 (retry)	trip	joint_space_speed	joint 0 = 81.9 °/s > 70
20	trip	eef_speed	3.049 m/s > 1.0

Halving the velocity from 40 to 20 made the reported speed worse (1.099 → 3.049 m/s) and the guard varies run-to-run — both impossible for a real, velocity-scaled motion. The trip is a delayed-frame artifact. No commanded velocity reliably passes on the stock runner, so a blocking safe-check gate would fail every PR. Until a mitigation lands, the CI step is therefore configured non-blocking (informational); see §9.

9 · Recommendations

Attack the callback-scheduling gap first (biggest win). The leader→convert→publish chain hops several threads in one GIL'd process. Options: run the teleop conversion on a dedicated high-priority thread fed by a bounded queue; minimise other Python work on the publish threads; or move the hot path off the shared interpreter. Re-measure pre_callback→cb_enter after any change.
Make the eef_speed guard jitter-robust so latency spikes don't cause false e-stops: compute speed over a short rolling window rather than a single Δt, require N consecutive over-threshold samples (not 1/1), or clamp Δt to the nominal period. This is what makes the CI gate deterministic.
Investigate the 71% setpoint-drop rate (§7) — log the validate_command rejection reason; if frames are being dropped for transient/duplicate reasons, fixing it will smooth the actual motion.
Trim per-command validation cost on the apply thread (§6③) — cache/vectorise the bound checks, or move them off the command thread.
Leave _sdk_lock alone — it is not the problem. Effort spent on finer locking would not move the numbers.
Velocity is not the lever for the CI gate — 70/40/20 all tripped and 20 was worst (§8). Keep the safe-check non-blocking in CI until either (a) the speed guards are made jitter-robust, or (b) it runs on a larger/faster runner with enough headroom to suppress the scheduling jitter. Only then flip it back to a hard gate.

10 · Reproduce

# fresh sims, then a 60 s traced teleop run with the per-hop breakdown docker rm -f fairino-fr3 fairino-fr20 uv run scripts/profile_teleop_jitter.py --duration-s 60 # render the standard interactive profiler HTML from the raw trace uv run python tools/profiler.py --last 1

Instrumentation is gated by ROBOT_TRACE=1 (set automatically by the profiler script) and is a no-op in production. The command_pre_lock / command_lock_acquired stamps used for the lock analysis live in fairino.py:_set_state_impl.

Generated for the fairino-sim-ci-and-jitter branch · data: 3021 teleop commands, 60 s, FR3→FR20 SimMachine pair · all latencies in milliseconds.