Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v8.7.1 transaction freeze #14949

Open
2 tasks done
vorlov opened this issue Oct 10, 2024 · 8 comments
Open
2 tasks done

v8.7.1 transaction freeze #14949

vorlov opened this issue Oct 10, 2024 · 8 comments
Labels
developer-experience This issue improves error messages, debugging, or reporting
Milestone

Comments

@vorlov
Copy link

vorlov commented Oct 10, 2024

Prerequisites

  • I have written a descriptive issue title
  • I have searched existing issues to ensure the bug has not already been reported

Mongoose version

v8.7.1

Node.js version

20.16.0

MongoDB server version

Atlas Latest

Typescript version (if applicable)

5.6.3

Description

const session = await this.jobModel.db.startSession(); this code freezes on v8.7.1, but works fine on v8.5.1

Steps to Reproduce

try to start session

Expected Behavior

no freezes

@vkarpov15
Copy link
Collaborator

Can you please provide a more complete repro script, like with how you're instantiating your model and where you're calling startSession()? We have transaction tests that passed on 8.7.1 release so the issue isn't as simple as "startSession() is broken in all cases on 8.7.1"

Image

@vkarpov15 vkarpov15 added the needs clarification This issue doesn't have enough information to be actionable. Close after 14 days of inactivity label Oct 16, 2024
Copy link

This issue is stale because it has been open 14 days with no activity. Remove stale label or comment or this will be closed in 5 days

@github-actions github-actions bot added the Stale label Oct 31, 2024
Copy link

github-actions bot commented Nov 6, 2024

This issue was closed because it has been inactive for 19 days and has been marked as stale.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 6, 2024
@t1bb4r
Copy link
Contributor

t1bb4r commented Dec 31, 2024

I'm experiencing the same issue. We have a test suite in which we experience this issue but each test passes in isolation so the code is difficult to reproduce. Downgrading to 8.5.5 did solve the problem for me.

I can share some debug logs. We do drop all our collections between each unit test. We use the mongodb driver to drop the collection (in the after each hook) and then use mongoose to start a session at which point the process seems to freeze. I'm trying to create a reproducible example.

  MongoMS:MongoInstance Mongo[40299]: stdoutHandler: ""{"t":{"$date":"2024-12-31T11:01:54.979+02:00"},"s":"I",  "c":"COMMAND",  "id":518070,  "ctx":"conn32","msg":"CMD: drop","attr":{"namespace":"test.orders"}}"" +17ms
  MongoMS:MongoInstance Mongo[40299]: stdoutHandler: ""{"t":{"$date":"2024-12-31T11:01:54.979+02:00"},"s":"I",  "c":"COMMAND",  "id":518070,  "ctx":"conn25","msg":"CMD: drop","attr":{"namespace":"test.users"}}"" +0ms
  MongoMS:MongoInstance Mongo[40299]: stdoutHandler: ""{"t":{"$date":"2024-12-31T11:01:54.979+02:00"},"s":"I",  "c":"STORAGE",  "id":20314,   "ctx":"conn32","msg":"dropCollection: storage engine will take ownership of drop-pending collection","attr":{"namespace":"test.orders","uuid":{"uuid":{"$uuid":"424adbb1-674c-4723-9793-31d7afb59a28"}},"dropOpTime":{"ts":{"$timestamp":{"t":0,"i":0}},"t":-1},"commitTimestamp":{"$timestamp":{"t":0,"i":0}}}}"" +0ms
  MongoMS:MongoInstance Mongo[40299]: stdoutHandler: ""{"t":{"$date":"2024-12-31T11:01:54.979+02:00"},"s":"I",  "c":"STORAGE",  "id":20314,   "ctx":"conn25","msg":"dropCollection: storage engine will take ownership of drop-pending collection","attr":{"namespace":"test.users","uuid":{"uuid":{"$uuid":"5903f952-7054-47a9-8606-04534e698af1"}},"dropOpTime":{"ts":{"$timestamp":{"t":0,"i":0}},"t":-1},"commitTimestamp":{"$timestamp":{"t":0,"i":0}}}}
  MongoMS:MongoInstance {"t":{"$date":"2024-12-31T11:01:54.979+02:00"},"s":"I",  "c":"STORAGE",  "id":20318,   "ctx":"conn32","msg":"Finishing collection drop","attr":{"namespace":"test.orders","uuid":{"uuid":{"$uuid":"424adbb1-674c-4723-9793-31d7afb59a28"}}}}
  MongoMS:MongoInstance {"t":{"$date":"2024-12-31T11:01:54.979+02:00"},"s":"I",  "c":"STORAGE",  "id":20318,   "ctx":"conn25","msg":"Finishing collection drop","attr":{"namespace":"test.users","uuid":{"uuid":{"$uuid":"5903f952-7054-47a9-8606-04534e698af1"}}}}"" +0ms
  MongoMS:MongoInstance Mongo[40299]: stdoutHandler: ""{"t":{"$date":"2024-12-31T11:01:54.979+02:00"},"s":"I",  "c":"STORAGE",  "id":22206,   "ctx":"conn32","msg":"Deferring table drop for index","attr":{"index":"_id_","namespace":"test.orders","uuid":{"uuid":{"$uuid":"424adbb1-674c-4723-9793-31d7afb59a28"}},"ident":"index-548--1581127540233199531","commitTimestamp":{"$timestamp":{"t":1735635714,"i":66}}}}
  MongoMS:MongoInstance {"t":{"$date":"2024-12-31T11:01:54.979+02:00"},"s":"I",  "c":"STORAGE",  "id":22214,   "ctx":"conn32","msg":"Deferring table drop for collection","attr":{"namespace":"test.orders","ident":"collection-547--1581127540233199531","commitTimestamp":{"$timestamp":{"t":1735635714,"i":66}}}}"" +0ms
  MongoMS:MongoInstance Mongo[40299]: stdoutHandler: ""{"t":{"$date":"2024-12-31T11:01:54.979+02:00"},"s":"I",  "c":"STORAGE",  "id":22206,   "ctx":"conn25","msg":"Deferring table drop for index","attr":{"index":"_id_","namespace":"test.users","uuid":{"uuid":{"$uuid":"5903f952-7054-47a9-8606-04534e698af1"}},"ident":"index-546--1581127540233199531","commitTimestamp":{"$timestamp":{"t":1735635714,"i":67}}}}
  MongoMS:MongoInstance {"t":{"$date":"2024-12-31T11:01:54.979+02:00"},"s":"I",  "c":"STORAGE",  "id":22214,   "ctx":"conn25","msg":"Deferring table drop for collection","attr":{"namespace":"test.users","ident":"collection-545--1581127540233199531","commitTimestamp":{"$timestamp":{"t":1735635714,"i":67}}}}"" +0ms
...
MongoMS:MongoInstance Mongo[40299]: stdoutHandler: ""{"t":{"$date":"2024-12-31T11:01:55.365+02:00"},"s":"I",  "c":"TXN",      "id":51802,   "ctx":"conn32","msg":"transaction","attr":{"parameters":{"lsid":{"id":{"$uuid":"d655c7e5-2603-4582-9eb2-e8763763c49c"},"uid":{"$binary":{"base64":"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=","subType":"0"}}},"txnNumber":66,"txnRetryCounter":0,"autocommit":false,"readConcern":{"level":"local","provenance":"implicitDefault"}},"readTimestamp":"Timestamp(0, 0)","keysExamined":8,"docsExamined":26,"ninserted":5,"keysInserted":5,"terminationCause":"committed","timeActiveMicros":2308,"timeInactiveMicros":103305,"numYields":0,"locks":{"FeatureCompatibilityVersion":{"acquireCount":{"w":1}},"ReplicationStateTransition":{"acquireCount":{"w":3}},"Global":{"acquireCount":{"w":1}},"Database":{"acquireCount":{"w":2}},"Collection":{"acquireCount":{"w":6}},"Mutex":{"acquireCount":{"r":54}}},"storage":{},"wasPrepared":false,"durationMillis":105}}"" +120ms

MongoMS:MongoInstance Mongo[40299]: stdoutHandler: ""{"t":{"$date":"2024-12-31T11:02:02.735+02:00"},"s":"I",  "c":"EXECUTOR", "id":6983000, "ctx":"conn3","msg":"Slow SessionWorkflow loop","attr":{"elapsed":{"totalMillis":10005,"activeMillis":10005,"receiveWorkMillis":0,"processWorkMillis":10005,"sendResponseMillis":0,"finalizeMillis":0}}}"" +60d

@vkarpov15
Copy link
Collaborator

@t1bb4r can you please provide some code samples that show exactly what you mean by "We use the mongodb driver to drop the collection (in the after each hook) and then use mongoose to start a session"?

@t1bb4r
Copy link
Contributor

t1bb4r commented Feb 6, 2025

@vkarpov15 I was finally able to reproduce this issue after trying out mongoose 8.10. Here is a link to the code that I've been using to reproduce the issue, https://github.com/t1bb4r/mongoose-issue-14949/blob/main/test.js.

In mongoose 8.10, instead of my tests freezing I got a new error "Connection operation buffering timed out after 10000ms". With this new error I noticed that manipulating the clock with sinon was the root cause of the problem. Adding sinon useFakeTimer to my test was the missing piece all along.

The shared test file results with different mongoose versions:
8.6.0 works
8.7.0 - 8.9 tests simply timeout after freezing on startSession
8.10 "Connection operation buffering timed out after 10000ms" - but only after 5 or 6 tests, not immediately. How many tests pass seems to different on different machines.

After some investigation I suspect that these heartbeat changes do not play well with sinon fake timers, maybe this PR #14812? Now that I've finally started smelling the smoke I can investigate a bit further and will let you know if I find something.

@t1bb4r
Copy link
Contributor

t1bb4r commented Feb 6, 2025

I think that I understand the issue a lot better now after some reading. I can explain the issue that I'm having by focusing on this piece of code:

    if (
      this._readyState === STATES.connected &&
      this._lastHeartbeatAt != null &&
      typeof this.client?.topology?.s?.description?.heartbeatFrequencyMS === 'number' &&
      Date.now() - this._lastHeartbeatAt >= this.client.topology.s.description.heartbeatFrequencyMS * 2) {
      return STATES.disconnected;
    }

In my case the test suites will run, and most tests will finish in under 30 seconds so heartbeat will be null for the full duration of the test and the issue will not appear. The issue only appears after the first heart beat sets the date, and then date is then modified by sinon causing it to be larger than heartbeatFrequencyMS * 2 resulting in a disconnected state which then causes the buffer timeout.

At the end of the day it is not a mongoose issue, but it's difficult to manipulate the tests and timers to get this working. Could we opt out of this check in our testing environment since the connection is still alive?

@vkarpov15 vkarpov15 reopened this Feb 6, 2025
@vkarpov15 vkarpov15 added this to the 8.10.1 milestone Feb 6, 2025
@vkarpov15 vkarpov15 added has repro script There is a repro script, the Mongoose devs need to confirm that it reproduces the issue and removed needs clarification This issue doesn't have enough information to be actionable. Close after 14 days of inactivity Stale labels Feb 6, 2025
@vkarpov15
Copy link
Collaborator

It is worth mentioning that we recommend not using useFakeTimers(). Stubbing out time can mix up both MongoDB and Mongoose's connection monitoring, which both rely on heartbeat intervals and checking how much time has elapsed.

That being said, the easiest way to work around this issue is to disable bufferCommands: await mongoose.connect(mongoUri, { bufferCommands: false });. Mongoose will still report readyState = 'disconnected', but it won't buffer any commands.

@vkarpov15 vkarpov15 added help This issue can likely be resolved in GitHub issues. No bug fixes, features, or docs necessary and removed has repro script There is a repro script, the Mongoose devs need to confirm that it reproduces the issue labels Feb 7, 2025
@vkarpov15 vkarpov15 removed this from the 8.10.1 milestone Feb 14, 2025
@vkarpov15 vkarpov15 reopened this Feb 14, 2025
@vkarpov15 vkarpov15 added developer-experience This issue improves error messages, debugging, or reporting and removed help This issue can likely be resolved in GitHub issues. No bug fixes, features, or docs necessary labels Feb 14, 2025
@vkarpov15 vkarpov15 added this to the Parking Lot milestone Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
developer-experience This issue improves error messages, debugging, or reporting
Projects
None yet
Development

No branches or pull requests

3 participants