Skip to content

Conversation

@taspelund
Copy link
Contributor

This updates PR #598 by more tightly integrating the NDP neighbor discovery for unnumbered peers with the BGP FSM.

Data structures have been updated such that BGP peers can now be identified by an interface name in addition to an IP address, which used to be the only way to do so.

An FSM is now created when an unnumbered interface is registered as a BGP peer, and the NDP discovery state is queried prior to outbound connection attempts and before routing a new inbound connection to an FSM. This allows the BGP FSM to manage its own state, with the NDP discovery state only affecting its ability to complete connections. NDP discovery (and the BGP unnumbered session for the associated interface) is no longer a one-shot operation, rather an ongoing passive one. Since this is no longer a one-shot operation, there is no longer a concept of a pending unnumbered peer - just an unnumbered peer keyed by interface name which may or may not succeed its connection attempts depending on what the NDP cache has in it.

The NDP integration into the FSM is done in a lightweight fashion that allows each component to have clean separations of concerns. The NDP manager handles interface registration/unregistration events by spawning tx/rx threads that send/receive NDP packets + maintain a single-entry neighbor cache. Updates to this cache are not signaled to BGP, thus ensuring BGP can manage its own state without NDP notifications causing churn. BGP relies on its own events/timers to determine whether the neighbor (and its session) remain valid rather than automatically triggering a reset of the session upon NDP cache expiry/swap.

Similarly, instead of having NDP manage an IP -> interface mapping (which has the consequence of leaving us susceptible to state corruption if multiple links discover the same neighbor IP) we rely on BGP to include the unnumbered interface as the nexthop_interface in the route Path. This ensures that there is only a single source of truth for what interface and next-hop should be used (BGP), rather than potentially allowing two different caches to get out of sync.
This prevents the following scenario:

  1. NDP discovers fe80::1 via eth0 (nexthop cache: fe80::1 -> eth0)
  2. BGP establishes to fe80::1%eth0
  3. BGP learns Route X from fe80::1%eth0
  4. NDP discovers fe80::1 via eth1 (nexthop cache: fe80::1 -> eth1)
  5. mg-lower queries the nexthop cache for fe80::1 and gets back eth1
  6. mg-lower installs Route X via fe80::1%eth1 (wrong!)

Integration tests have been added to validate v4/v6 routes are properly exchanged over unnumbered sessions, NDP events do not trigger BGP events, and that multiple peers with the same link-local IP are handled properly. NDP events are tested via an UnnumberedManagerMock impl of trait UnnumberedManager, which allows us to manually trigger NDP events.

Other small items:

  • moved tx/rx ndp threads to use ManagedThread pattern
  • provide name to tx/rx ndp threads
  • removing pending unnumbered sessions and its APIs

Moves unnumbered peers from being a one-shot operation that creates an
FSM when NDP discovery is successful, to having a persistent FSM that is
created/deleted when the unnumbered peer is created/deleted (just like
numbered peers). Since the FSM is now always present and there's no
more pending peers, the pending unnumbered members concept has been
removed.

The FSM is updated to query the UnnumberedManager for an NDP neighbor
as part of initiate_connection(). This allows the FSM to manage its own
state transitions based on FSM events rather than allowing NDP events
to drive the state machine. The outbound connection attempt logic is
unchanged: connection attempts are still asynchronous and failures
are transparent to the FSM, allowing it to transition back to Idle
upon a ConnectRetryTimerExpires event if a successful connection event
doesn't arrive first. However, now an unknown NDP neighbor at the time
of the connection attempt is treated as an async failure (no outbound
connection success will be reported) and the FSM will cycle back to Idle
when ConnectRetryTimerExpires. Similarly, if the FSM receives an inbound
connection attempt and there is no known NDP neighbor, the connection
is rejected.

The unnumbered manager receives interface registration/unregistration
events when unnumbered peers are created/deleted. While the unnumbered
peer exists, the unnumbered manager runs the tx/rx loops to maintain
a single-entry NDP cache holding the discovered peer. The unnumbered
manager also maintains a map of scope_id to interface name (learned from
the OS). The NDP cache is queried by the FSM during connection attempts,
and the scope_id to interface map is used by the connection Dispatcher.

The Dispatcher has been updated to query the unnumbered manager to map
a scope_id (from the SocketAddr it accept()'s) back to an interface
name which allows it to find the appropriate SessionRunner/FSM. Source
IP validation for the inbound connection is done by the FSM: the new
connection's Source IP is checked against the NDP peer IP if there is no
ongoing connection, else it will check against the ongoing connection's
Peer IP (If it doesn't match, reject the new connection. If it does
match, consider it a collision.).

In order to facilitate all this session mapping, the addr_to_session
data structures have been updated (now called peer_to_session) to
use a new PeerId enum as the key instead of an IpAddr. PeerId has
two variants: Ip that holds an IpAddr (matching the old behavior)
and Interface which holds a String of the interface name. This allows
the session lookups to work for unnumbered peers by interface name
without needing to know anything about the dynamic NDP peer state.

Local BGP integration tests are added to exercise the unnumbered code
paths for session establishment, route exchange, and NDP state changes.
The NDP state changes are manually manipulated using a new trait called
UnnumberedManager and its test impl UnnumberedManagerMock. The existing
code is accessed via the prod impl UnnumberedManagerNdp.
Add  test coverage for unnumbered BGP sessions, including
multi-interface support, NDP lifecycle testing, and route exchange
validation.

Channel based tests rely on each bind() call being used to register
ownership of a single SocketAddr into the NET HashMap. This means that
in order to have more than 1 peer in a channel based test, you must call
bind() multiple times. However, the Dispatcher is responsible for bind()
as well as for spawning a Listener for inbound connections. So to make
this work, the new test_three_router_chain_unnumbered() creates a
Dispatcher per unnumbered interface. Since it is a SocketAddr being
registered in NET, we can re-use the same link-local address in multiple
places simply by disambiguating them with a unique scope_id. This also
allows us to test having two peers with the same link-local address
learned via two different links.

New tests verify:
- Sessions persist through NDP updates and expiry
- Sessions reconnect with current NDP neighbor after reset
- scope_id isolates same link-local IP on different interfaces
- IPv4/IPv6 routes exchange with correct link-local nexthops

Fixes existing unnumbered tests by correcting scope_id handling.
Get rid of the unnumbered nexthop hash map in ndp manager in favor of
encoding BGP unnumbered nexthop interfaces into the Path itself. This
ensures that BGP is the source of truth for the next-hop and interface
used for routes from a given session, rather than using the NDP cache
which can update without BGP's knowledge. This also has the added
benefit of allowing BGP to disambiguate the same link-local address
across multiple interfaces -- each unnumbered peer knows what its
interface name is, and thus can supply its name alongside the link-local
next-hop in order to support the same link-local address via multiple
interfaces at the same time.

This also updates the mgd API to support both numbered and unnumbered
peers via unified endpoints. Unfortunately, dropshot cannot represent
a non-scalar type (like enum PeerId) as a Query. So the API has been
moved to representing the peer as a String, which will be parsed as an
IpAddr if possible with a fallback of accepting the String as an
interface name.
@taspelund taspelund self-assigned this Jan 27, 2026
@taspelund taspelund added bgp Border Gateway Protocol mg-lower Maghemite lower half mgd Maghemite daemon customer For any bug reports or feature requests tied to customer requests rust Pull requests that update rust code labels Jan 27, 2026
@taspelund taspelund changed the base branch from main to ry/mgd-v6-router-disco January 27, 2026 01:43
Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
Moving common neighbor args into NeighborCommon resulted in some of the
CLI args appearing in a different order, breaking scripts elsewhere
(e.g. init.sh for interop topology). This unifies numbered/unnumbered
neighbors into a single CLI which is then split into Neighbor +
UnnumberedNeighbor API types for the appropriate call.
Expose NDP interfaces and their neighbor cache via the API. This allows
an operator to see what the current config/state is for a managed
unnumbered interface.

Renames get_neighbors_unified() to get_neighbors_v4() aligning with the
versioned method names.
@taspelund taspelund force-pushed the trey/bgp-unnumbered-update branch from 00620bf to 313e59a Compare January 28, 2026 00:00
Fixes issue where update API would return "no such interface" errors
for unnumbered neighbors. Allows API handlers to determine if there
is already an active session for unnumbered peers. Without this, there
was no handling for updates to an existing unnumbered peer (requiring a
delete/add workflow).
Fix lookup that was using PeerId::Ip instead of PeerId::Interface.
Ensure that interface removal empties the interface scope map even if
the interface isn't a registered interface (e.g. if there was a partial
failure during an earlier call).
Fixes a bug where multiple peers with the same IP have their routes
tracked as coming from the same source. This is really only exposed
when using link-local addrs (BGP unnumbered) since that's the only time
the same IP can be found on multiple peers.

Scenario:
- R2 has BGP unnumbered peers R1 / R3
- R1/R3 both use the same link-local IP, e.g. fe80::1
- R2 learns 3fff:10::/64 from R1, 3fff:30::/64 from R3
- R2 stores fe80::1 in BgpPathProperties.peer field for both R1/R3
- R1 shuts down the BGP session
- R2 cleans up all BGP routes matching BgpPathProperties.peer == fe80::1
- 3fff:20::/64 is correctly removed from the RIB
- 3fff:30::/64 is erroneously removed from the RIB <-- wrong!

test_three_router_chain_unnumbered has been updated to exercise this
exact scenario - which failed prior to the IpAddr -> PeerId type update.

Also updates API to use Path where peer is PeerId instead of IpAddr.
@taspelund taspelund force-pushed the trey/bgp-unnumbered-update branch from 4027dde to d94d5ed Compare January 28, 2026 20:02
Fixes CLI commands that only accepted IP addresses, preventing
operations on unnumbered BGP peers (identified by interface name).

Changes:
- Clear command: addr (IpAddr) → peer (String)
  - Added routing logic to detect peer type via parse_peer_type()
  - Routes to clear_neighbor_v2() for numbered peers
  - Routes to clear_unnumbered_neighbor() for unnumbered peers

- FSM history: peer (Option<IpAddr>) → peer (Option<String>)
  - Parses peer string to PeerId enum before API call
  - Updated get_fsm_history() handler to convert types

- Message history: peer (IpAddr) → peer (String)
  - Parses peer string to PeerId enum before API call
  - Updated get_message_history() handler to convert types

- Removed dead code: ExportPolicy struct (never used)

All commands now follow the pattern established by neighbor read/
delete/create commands, which already supported both peer types.

Users can now specify peers as either IP addresses (numbered) or
interface names (unnumbered) consistently across all CLI commands.
@taspelund taspelund force-pushed the trey/bgp-unnumbered-update branch from d94d5ed to ce0c6cb Compare January 28, 2026 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bgp Border Gateway Protocol customer For any bug reports or feature requests tied to customer requests mg-lower Maghemite lower half mgd Maghemite daemon rust Pull requests that update rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants