Skip to content

Conversation

@s074
Copy link
Member

@s074 s074 commented Dec 15, 2025

The problem was not properly addressed last time because the exception was crashing the whole connection instead of just the channel, so the exception got thrown again when we tried creating a new channel on the old disconnected connection.

To address the instability that is introduced by publishing to a queue that might or might not be there, due to the race condition, we now isolate the connections. We have separate connections for publishing and consuming messages, this way, when the exception triggers our publishing connection to close, our event consumers are unaffected since they will be using a separate connection. This also means that we can easily restart the connection without worrying about having to reconstruct any state (ie recreating all the user channels and restablishing all the event consumers) since the publishing connection does not have any event consumers and was used purely for publishing events

When RabbitMQ connection drops (including from 506 RESOURCE_ERROR):

  1. RabbitMQ.ts detects the connection close and emits disconnected
  2. After exponential backoff delay, it attempts to reconnect
  3. On successful reconnect, it emits reconnected
  4. All consumers (WebSocket listeners, spacebar events, rate limits) re-establish their subscriptions

@s074 s074 marked this pull request as draft December 15, 2025 01:03
@MathMan05
Copy link
Contributor

How's this going?

@s074 s074 force-pushed the add-rabbitmq-error-handling branch from 90dd402 to 3812f47 Compare January 22, 2026 23:10
@MathMan05
Copy link
Contributor

Hey, I got a notification for this, is something happening?

@s074 s074 marked this pull request as ready for review January 22, 2026 23:21
@s074
Copy link
Member Author

s074 commented Jan 22, 2026

Hey, I got a notification for this, is something happening?

yes, I changed my approach and it is now ready for review. instead of separating listening/publishing in 2 separate connections, I just added code that re-establishes all the subscriptions on reconnection so we can use a single connection

@TheArcaneBrony
Copy link
Member

Good morning. This all seems reasonable to me, though will need to take a closer look :)

Copy link
Member

@TheArcaneBrony TheArcaneBrony left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from another cursory glance, though the jitter system and timeouts seem a bit... overkill?

Comment on lines -116 to +121
guilds.forEach(async (guild) => {
const permission = await getPermission(this.user_id, guild.id);
this.permissions[guild.id] = permission;
this.events[guild.id] = await listenEvent(guild.id, consumer, opts);
for (const guild of guilds) {
const permission = await getPermission(this.user_id, guild.id);
this.permissions[guild.id] = permission;
this.events[guild.id] = await listenEvent(guild.id, consumer, opts);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(+ other examples)
Is there any reason for rewriting these as regular for loops?
I would've expected it to be more logical to rewrite them as await Promise.whenAll(guilds.map(... ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ForEach does not support async callback functions so I rewrote it to regular for loops. I could do a map function and a Promise.all though like you're suggesting

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would probably be beneficial, especially under load (or ie. my account, where there's a lot of guilds and channels lol)


// Exponential backoff with jitter
const baseDelay = Math.min(this.BASE_RECONNECT_DELAY_MS * Math.pow(2, this.reconnectAttempts - 1), this.MAX_RECONNECT_DELAY_MS);
// Add jitter (±25%) to prevent thundering herd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting to use unicode here instead of a ~... 🤨
Is this AI-generated? lol

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I did ask Gemini about a good way to do reconnect logic and it gave me this back off delay jitter bs. I could remove it if you think it is overkill

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants