-
-
Notifications
You must be signed in to change notification settings - Fork 261
try to resolve race condition, third time #1439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
How's this going? |
90dd402 to
3812f47
Compare
|
Hey, I got a notification for this, is something happening? |
yes, I changed my approach and it is now ready for review. instead of separating listening/publishing in 2 separate connections, I just added code that re-establishes all the subscriptions on reconnection so we can use a single connection |
|
Good morning. This all seems reasonable to me, though will need to take a closer look :) |
TheArcaneBrony
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from another cursory glance, though the jitter system and timeouts seem a bit... overkill?
| guilds.forEach(async (guild) => { | ||
| const permission = await getPermission(this.user_id, guild.id); | ||
| this.permissions[guild.id] = permission; | ||
| this.events[guild.id] = await listenEvent(guild.id, consumer, opts); | ||
| for (const guild of guilds) { | ||
| const permission = await getPermission(this.user_id, guild.id); | ||
| this.permissions[guild.id] = permission; | ||
| this.events[guild.id] = await listenEvent(guild.id, consumer, opts); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(+ other examples)
Is there any reason for rewriting these as regular for loops?
I would've expected it to be more logical to rewrite them as await Promise.whenAll(guilds.map(... ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ForEach does not support async callback functions so I rewrote it to regular for loops. I could do a map function and a Promise.all though like you're suggesting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would probably be beneficial, especially under load (or ie. my account, where there's a lot of guilds and channels lol)
|
|
||
| // Exponential backoff with jitter | ||
| const baseDelay = Math.min(this.BASE_RECONNECT_DELAY_MS * Math.pow(2, this.reconnectAttempts - 1), this.MAX_RECONNECT_DELAY_MS); | ||
| // Add jitter (±25%) to prevent thundering herd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting to use unicode here instead of a ~... 🤨
Is this AI-generated? lol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I did ask Gemini about a good way to do reconnect logic and it gave me this back off delay jitter bs. I could remove it if you think it is overkill
The problem was not properly addressed last time because the exception was crashing the whole connection instead of just the channel, so the exception got thrown again when we tried creating a new channel on the old disconnected connection.
To address the instability that is introduced by publishing to a queue that might or might not be there, due to the race condition,
we now isolate the connections. We have separate connections for publishing and consuming messages, this way, when the exception triggers our publishing connection to close, our event consumers are unaffected since they will be using a separate connection. This also means that we can easily restart the connection without worrying about having to reconstruct any state (ie recreating all the user channels and restablishing all the event consumers) since the publishing connection does not have any event consumers and was used purely for publishing eventsWhen RabbitMQ connection drops (including from 506 RESOURCE_ERROR):
RabbitMQ.tsdetects the connection close and emitsdisconnectedreconnected