Commit 7c319f5
committed
Make _bt_killitems drop pins it acquired itself.
Teach nbtree's _bt_killitems to leave the so->currPos page that it sets
LP_DEAD items on in whatever state it was in when _bt_killitems was
called. In particular, make sure that so->dropPin scans don't acquire a
pin whose reference is saved in so->currPos.buf.
Allowing _bt_killitems to change so->currPos.buf like this is wrong.
The immediate consequence of allowing it is that code in _bt_steppage
(that copies so->currPos into so->markPos) will behave as if the scan is
a !so->dropPin scan. so->markPos will therefore retain the buffer pin
indefinitely, even though _bt_killitems only needs to acquire a pin
(along with a lock) for long enough to mark known-dead items LP_DEAD.
This issue came to light following a report of a failure of an assertion
from recent commit e6eed40. The test case in question involves the use
of mark and restore. An initial call to _bt_killitems takes place that
leaves so->currPos.buf in a state that is inconsistent with the scan
being so->dropPin. A subsequent call to _bt_killitems for the same
position (following so->currPos being saved in so->markPos, and then
restored as so->currPos) resulted in the failure of an assertion that
tests that so->currPos.buf is InvalidBuffer when the scan is so->dropPin
(non-assert builds got a "resource was not closed" WARNING instead).
The same problem exists on earlier releases, though the issue is far
more subtle there. Recent commit e6eed40 introduced the so->dropPin
field as a partial replacement for testing so->currPos.buf directly.
Earlier releases won't get an assertion failure (or buffer pin leak),
but they will allow the second _bt_killitems call from the test case to
behave as if a buffer pin was consistently held since the original call
to _bt_readpage. This is wrong; there will have been an initial window
during which no pin was held on the so->currPos page, and yet the second
_bt_killitems call will neglect to check if so->currPos.lsn continues to
match the page's now-current LSN.
As a result of all this, it's just about possible that _bt_killitems
will set the wrong items LP_DEAD (on release branches). This could only
happen with merge joins (the sole user of nbtree mark/restore support),
when a concurrently inserted index tuple used a recently-recycled TID
(and only when the new tuple was inserted onto the same page as a
distinct concurrently-removed tuple with the same TID). This is exactly
the scenario that _bt_killitems' check of the page's now-current LSN
against the LSN stashed in currPos was supposed to prevent.
A follow-up commit will make nbtree completely stop conditioning whether
or not a position's pin needs to be dropped on whether the 'buf' field
is set. All call sites that might need to drop a still-held pin will be
taught to rely on the scan-level so->dropPin field recently introduced
by commit e6eed40. That will make bugs of the same general nature as
this one impossible (or make them much easier to detect, at least).
Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/545be1e5-3786-439a-9257-a90d30f8b849@gmail.com
Backpatch-through: 131 parent 3614995 commit 7c319f5
2 files changed
+28
-21
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
417 | 417 | | |
418 | 418 | | |
419 | 419 | | |
| 420 | + | |
| 421 | + | |
420 | 422 | | |
421 | 423 | | |
422 | 424 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3323 | 3323 | | |
3324 | 3324 | | |
3325 | 3325 | | |
3326 | | - | |
3327 | | - | |
3328 | | - | |
| 3326 | + | |
| 3327 | + | |
| 3328 | + | |
3329 | 3329 | | |
3330 | | - | |
3331 | | - | |
3332 | | - | |
3333 | | - | |
3334 | | - | |
3335 | | - | |
3336 | | - | |
3337 | | - | |
3338 | | - | |
3339 | | - | |
3340 | | - | |
3341 | | - | |
| 3330 | + | |
| 3331 | + | |
| 3332 | + | |
| 3333 | + | |
| 3334 | + | |
| 3335 | + | |
| 3336 | + | |
| 3337 | + | |
3342 | 3338 | | |
3343 | 3339 | | |
| 3340 | + | |
| 3341 | + | |
| 3342 | + | |
| 3343 | + | |
| 3344 | + | |
| 3345 | + | |
3344 | 3346 | | |
3345 | 3347 | | |
3346 | 3348 | | |
| |||
3353 | 3355 | | |
3354 | 3356 | | |
3355 | 3357 | | |
| 3358 | + | |
3356 | 3359 | | |
3357 | 3360 | | |
3358 | 3361 | | |
| |||
3369 | 3372 | | |
3370 | 3373 | | |
3371 | 3374 | | |
3372 | | - | |
| 3375 | + | |
| 3376 | + | |
3373 | 3377 | | |
3374 | 3378 | | |
3375 | 3379 | | |
3376 | | - | |
3377 | 3380 | | |
3378 | 3381 | | |
3379 | 3382 | | |
| |||
3391 | 3394 | | |
3392 | 3395 | | |
3393 | 3396 | | |
3394 | | - | |
3395 | 3397 | | |
3396 | 3398 | | |
3397 | | - | |
| 3399 | + | |
3398 | 3400 | | |
3399 | 3401 | | |
3400 | 3402 | | |
| |||
3511 | 3513 | | |
3512 | 3514 | | |
3513 | 3515 | | |
3514 | | - | |
| 3516 | + | |
3515 | 3517 | | |
3516 | 3518 | | |
3517 | | - | |
| 3519 | + | |
| 3520 | + | |
| 3521 | + | |
| 3522 | + | |
3518 | 3523 | | |
3519 | 3524 | | |
3520 | 3525 | | |
| |||
0 commit comments