[pg-pool] Connection timeouts cause connection leaks

We found that when `connectionTimeoutMillis` is hit the following things happen:

1. A connection is opened on the Postgres side, and is never actually utilized
2. `pg-pool` continues tracking the connection but never marks it as idle
3. The connection isn’t cleaned up until the pool itself is closed (or the NodeJS process dies)

We were able to track these connections in Postgres by running the following query:

```sql
select * from pg_stat_activity
where query_start is NULL and state = 'idle';
```

What's unusual is that the connection "times out" but continues to maintain an idle connection on the Postgres side, without ever marking it as idle in the pool. Might suggest a race condition here between the timeout and the client connection.

## Reproduction script

```jsx
/* eslint-disable prettier/prettier */
import Pool from 'pg-pool';
import Client from 'pg-native';

const pool = new Pool({
    port: 5432,
    // Setting this to 1 will cause everything to crash.
    max: 2,
    connectionTimeoutMillis: 1000,
    // Comment this out to try with `pg`
    // Client,
    log: (msg) => {
      console.log(msg);
    }
});

let connection = 0;
const baseConnect = pool.Client.prototype.connect;
pool.Client.prototype.connect = function (cb) {
    if (connection === 0) {
        connection += 1;
        return setTimeout(() => 
            baseConnect.call(this, cb),
            1100
        )
    }

    connection += 1;
    return baseConnect.call(this, cb)
}

let acquireCount = 0
pool.on('acquire', function (client) {
  acquireCount++
});

let connectCount = 0
pool.on('connect', function () {
  connectCount++
});

let releaseCount = 0
pool.on('release', function () {
  releaseCount++
});

let removeCount = 0
pool.on('remove', function () {
  removeCount++
});

let errorCount = 0
pool.on('error', function () {
  errorCount++
});

await pool.connect();

let zombiesCount = 0;
const res = await pool.query(
    'select count(*) from pg_stat_activity where query_start is NULL and state = $1::text;',
    ['idle']
);

if (res.rows) {
    zombiesCount = res.rows[0]['count'] 
} else {
    zombiesCount = res[0]['count'] 
}

console.log("-----------------")
console.log("Total connections", connectCount);
console.log("Total acquisitions: ", acquireCount);
console.log("Total releases: ", releaseCount);
console.log("Total removes: ", removeCount);
console.log("Total errors: ", errorCount);
console.log("Total leaked connections: ", zombiesCount);
console.log("Total current pool size: ", pool.totalCount);
console.log("Total current idle pool size: ", pool.idleCount);
console.log("-----------------")

pool.end();
```

## Without exhausting the pool

Here you can see the issue in action, where a connection is held open after the initial failed `connect` call and we end with a pool of two connections, only one of which is idle.

### With `pg` as client

```
➜ node test.mjs
checking client timeout
connecting new client
ending client due to timeout
new client connected
checking client timeout
connecting new client
new client connected
dispatching query
query dispatched
pulse queue
no queued requests
-----------------
Total connections 2
Total acquisitions:  2
Total releases:  1
Total removes:  0
Total errors:  0
Total leaked connections:  1
Total current pool size:  2
Total current idle pool size:  1
-----------------
pulse queue
no queued requests
ending
pulse queue
pulse queue on ending
```

### With `pg-native` as client

```
➜ node test.mjs
checking client timeout
connecting new client
ending client due to timeout
new client connected
checking client timeout
connecting new client
new client connected
dispatching query
query dispatched
-----------------
Total connections 2
Total acquisitions:  2
Total releases:  1
Total removes:  0
Total errors:  0
Total leaked connections:  1
Total current pool size:  1
Total current idle pool size:  0
-----------------
ending
pulse queue
pulse queue on ending
pulse queue
pulse queue ended
pulse queue
pulse queue ended
```

## With exhausting the pool

You can test this by setting `max: 1` in the reproduction script’s config. This illustrates that leaked connections don’t get cleaned up when a new connection is requested, meaning it’s possible to exhaust the pool with enough connection timeouts.

### With `pg` as client

```
➜ node test.mjs
checking client timeout
connecting new client
ending client due to timeout
new client connected
/Users/nick/node-postgres/packages/pg-pool/index.js:45
    Error.captureStackTrace(err)
          ^

Error: timeout exceeded when trying to connect
    at /Users/nick/node-postgres/packages/pg-pool/index.js:45:11
    at async file:///Users/nick/node-postgres/packages/pg-pool/test.mjs:67:13

Node.js v20.17.0
```

### With `pg-native` as client

`pg-native` has even more unusual behavior here, as it totally crashes the script and exits with code `13`. No error is thrown.

```
➜ node test.mjs
checking client timeout
connecting new client
ending client due to timeout
new client connected

```

If there is another Promise being awaited then no crash occurs and we resolve as usual (you can test this by creating an unresolved promise before the leaked connection query and awaiting it after the query is fired off).

```
const promise = new Promise((resolve) => setTimeout(resolve, 2000));
const res = await pool.query(
    'select count(*) from pg_stat_activity where query_start is NULL and state = $1::text;',
    ['idle']
);
await promise
```

The output is as such:

```
➜ node test.mjs
checking client timeout
connecting new client
ending client due to timeout
new client connected
checking client timeout
connecting new client
new client connected
dispatching query
query dispatched
pulse queue
no queued requests
-----------------
Total connections 2
Total acquisitions:  2
Total releases:  1
Total removes:  1
Total errors:  0
Total leaked connections:  1
Total current pool size:  1
Total current idle pool size:  0
-----------------
ending
pulse queue
pulse queue on ending
pulse queue
pulse queue ended

```

## Possible fix for `pg`

I was able to resolve this for `pg` specifically with the following change

```diff
diff --git a/packages/pg-pool/index.js b/packages/pg-pool/index.js
index 3e505f79..59c258c6 100644
--- a/packages/pg-pool/index.js
+++ b/packages/pg-pool/index.js
@@ -244,7 +244,7 @@ class Pool extends EventEmitter {
         this.log('ending client due to timeout')
         timeoutHit = true
         // force kill the node driver, and let libpq do its teardown
-        client.connection ? client.connection.stream.destroy() : client.end()
+        client.connection ? client.connection.end() : client.end()
       }, this.options.connectionTimeoutMillis)
     }
```

Which resulted in the following output:

```
➜ node test.mjs
checking client timeout
connecting new client
ending client due to timeout
client failed to connect
pulse queue
no queued requests
checking client timeout
connecting new client
new client connected
dispatching query
query dispatched
pulse queue
no queued requests
-----------------
Total connections 1
Total acquisitions:  1
Total releases:  1
Total removes:  0
Total errors:  0
Total leaked connections:  0
Total current pool size:  1
Total current idle pool size:  1
-----------------
ending
pulse queue
pulse queue on ending

```

I have no idea what the ramifications of that change would be, but thought it was worth mentioning.

# How did we run into this?

https://github.com/hex-inc/typeorm/blob/96ce9c27b59297224aba171bb7ac0bd0ccfb04ee/src/driver/postgres/PostgresDriver.ts#L821

TypeORM's Postgres driver uses the `connect` API here. During a timeout `pg-pool` doesn't return any sort of functional release callback (which makes sense given that the connection "timed out"), so there's no way for TypeORM or other consumers, to clean up these leaked connections.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[pg-pool] Connection timeouts cause connection leaks #3543

Reproduction script

Without exhausting the pool

With `pg` as client

With `pg-native` as client

With exhausting the pool

With `pg` as client

With `pg-native` as client

Possible fix for `pg`

How did we run into this?

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[pg-pool] Connection timeouts cause connection leaks #3543

Description

Reproduction script

Without exhausting the pool

With pg as client

With pg-native as client

With exhausting the pool

With pg as client

With pg-native as client

Possible fix for pg

How did we run into this?

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

With `pg` as client

With `pg-native` as client

With `pg` as client

With `pg-native` as client

Possible fix for `pg`