Skip to content

Conversation

@melanieplageman
Copy link
Owner

No description provided.

tglsfdc and others added 30 commits August 14, 2020 22:14
nodeSubplan.c expects that the testexpr for a hashable ANY SubPlan
has the form of one or more OpExprs whose LHS is an expression of the
outer query's, while the RHS is an expression over Params representing
output columns of the subquery.  However, the planner only went as far
as verifying that the clauses were all binary OpExprs.  This works
99.99% of the time, because the clauses have the right shape when
emitted by the parser --- but it's possible for function inlining to
break that, as reported by PegoraroF10.  To fix, teach the planner
to check that the LHS and RHS contain the right things, or more
accurately don't contain the wrong things.  Given that this has been
broken for years without anyone noticing, it seems sufficient to just
give up hashing when it happens, rather than go to the trouble of
commuting the clauses back again (which wouldn't necessarily work
anyway).

While poking at that, I also noticed that nodeSubplan.c had a baked-in
assumption that the number of hash clauses is identical to the number
of subquery output columns.  Again, that's fine as far as parser output
goes, but it's not hard to break it via function inlining.  There seems
little reason for that assumption though --- AFAICS, the only thing
it's buying us is not having to store the number of hash clauses
explicitly.  Adding code to the planner to reject such cases would take
more code than getting nodeSubplan.c to cope, so I fixed it that way.

This has been broken for as long as we've had hashable SubPlans,
so back-patch to all supported branches.

Discussion: https://postgr.es/m/1549209182255-0.post@n3.nabble.com
Commit 7259736 added two variables CheckXidAlive and bsysscan to detect
concurrent aborts and used these in inline functions that are part of the
API that can be used by extensions. So it is better to mark them with
PGDLLIMPORT.

Reported-by: Thomas Munro
Discussion: https://postgr.es/m/CA+hUKGJ7+HYupd=Pz9+QrXa-C_YnbC4rAYu8V+=OKg=UgdzMeg@mail.gmail.com
The version being checked for is 20 years old.

Reviewed-by: Marco Atzeri <marco.atzeri@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/aa266ede-baaa-f4e6-06cf-5b1737610e9a%402ndquadrant.com
Since commit 865f14a made "=>" unusable as an operator name,
it's been impossible either to install hstore 1.0 or to execute
this update script.  There's not much point in continuing
to ship it.

Discussion: https://postgr.es/m/653936.1597431032@sss.pgh.pa.us
The SimpleLruTruncate() header comment states the new coding rule.  To
achieve this, add locktype "frozenid" and two LWLocks.  This closes a
rare opportunity for data loss, which manifested as "apparent
wraparound" or "could not access status of transaction" errors.  Data
loss is more likely in pg_multixact, due to released branches' thin
margin between multiStopLimit and multiWrapLimit.  If a user's physical
replication primary logged ":  apparent wraparound" messages, the user
should rebuild standbys of that primary regardless of symptoms.  At less
risk is a cluster having emitted "not accepting commands" errors or
"must be vacuumed" warnings at some point.  One can test a cluster for
this data loss by running VACUUM FREEZE in every database.  Back-patch
to 9.5 (all supported versions).

Discussion: https://postgr.es/m/20190218073103.GA1434723@rfd.leadboat.com
Put the -r option in the right section (it certainly isn't an
option controlling "the location and format of the output").

Clarify the behavior of the tablespace and waldir options
(that part per gripe from robert@interactive.co.uk).

Make a large number of small copy-editing fixes in text that
visibly wasn't written by native speakers, and try to avoid
grammatical inconsistencies between the descriptions of
the different options.

Back-patch to v13, since HEAD hasn't meaningfully diverged yet.

Discussion: https://postgr.es/m/159749418850.14322.216503677134569752@wrigleys.postgresql.org
Reuse cautionary language from src/test/ssl/README in
src/test/kerberos/README.  SLRUs have had access to six-character
segments names since commit 73c986a,
and recovery stopped calling HeapTupleHeaderAdvanceLatestRemovedXid() in
commit 558a916.  The other corrections
are more self-evident.
This is temporary. While possibly some more error checking / debugging
in this path would be a good thing, it'll not look exactly like this.

Discussion: https://postgr.es/m/20200816181604.l54m6kss5ntd6xow@alap3.anarazel.de
This bug, recently introduced in 941697c, at least lead to vacuum
failing because it found tuples inserted by a running transaction, but
below the freeze limit. The freeze limit in turn is directly affected
by the aforementioned bug.

Thanks to Tom Lane figuring how to make the bug reproducible.

We should add a few more assertions to make sure this type of bug
isn't as hard to notice, but it's not yet clear how to best do so.

Co-Diagnosed-By: Tom Lane <tgl@sss.pgh.pa.us>
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/1013484.1597609043@sss.pgh.pa.us
Since 806a2ae, the work of the bgwriter is split the checkpointer, but a
portion of the documentation did not get the message.

Author: Masahiko Sawada
Discussion: https://postgr.es/m/CA+fd4k6jXxjAtjMVC=wG3=QGpauZBtcgN3Jhw+oV7zXGKVLKzQ@mail.gmail.com
Backpatch-through: 9.5
A number of client programs have a "--progress" option that when printing
to a TTY, updates the current line by printing a '\r' and overwriting it.
After the last line, '\n' needs to be printed to move the cursor to the
next line. pg_basebackup and pgbench got this right, but pg_rewind and
pg_checksums were slightly wrong. pg_rewind printed the newline to stdout
instead of stderr, and pg_checksums printed the newline even when not
printing to a TTY. Fix them, and also add a 'finished' argument to
pg_basebackup's progress_report() function, to keep it consistent with
the other programs.

Backpatch to v12. pg_rewind's newline was broken with the logging changes
in commit cc8d415 in v12, and pg_checksums was introduced in v12.

Discussion: https://www.postgresql.org/message-id/82b539e5-ae33-34b0-1aee-22b3379fd3eb@iki.fi
xact.h included utils/datetime.h, which cannot be used in the frontend
(it includes fmgr.h, which needs Datum). But xact.h only needs the
definition of TimestampTz from it, which is available directly in
datatypes/timestamp.h. Change xact.h to include that instead of
utils/datetime.h, so that it can be used in client programs.
If a commit or abort record includes "dropped relfilenodes", then replaying
the record will remove data files. That is surely a "special rel update",
but the records were not marked as such. Fix that, teach pg_rewind to
expect and ignore them, and add a test case to cover it.

It's always been like this, but no backporting for fear of breaking
existing applications. If an application parsed the WAL but was not
handling commit/abort records, it would stop working. That might be a good
thing if it really needed to handle the dropped rels, but it will be caught
when the application is updated to work with PostgreSQL v14 anyway.

Discussion: https://www.postgresql.org/message-id/07b33e2c-46a6-86a1-5f9e-a7da73fddb95%40iki.fi
Reviewed-by: Amit Kapila, Michael Paquier
The description of what select_common_type() does was not terribly
accurate.  Improve it.

David Johnston and Tom Lane

Discussion: https://postgr.es/m/1019930.1597613200@sss.pgh.pa.us
This should improve stability in the tests.

Per buildfarm member hyrax (CLOBBER_CACHE_ALWAYS) via Tom Lane.

Discussion: https://postgr.es/m/871534.1597503261@sss.pgh.pa.us
PL/Sample is an example template of procedural-language handler.  This
can be used as a base to implement a custom PL, or as a facility to test
APIs dedicated to PLs.  Much more could be done in this module, like
adding a simple validator, but this is left as future work.

The documentation included originally some C code to understand the
basics of PL handler implementation, but it was outdated, and not really
helpful either if trying to implement a new procedural language,
particularly when it came to the integration of a PL installation with
CREATE EXTENSION.

Author: Mark Wong
Reviewed-by: Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/20200612172648.GA3327@2ndQuadrant.com
Introduced in adbe62d.  Per buildfarm member prion, when using
RELCACHE_FORCE_RELEASE.
Previous commits made it faster/more scalable to compute snapshots. But not
building a snapshot is still faster. Now that GetSnapshotData() does not
maintain RecentGlobal* anymore, that is actually not too hard:

This commit introduces xactCompletionCount, which tracks the number of
top-level transactions with xids (i.e. which may have modified the database)
that completed in some form since the start of the server.

We can avoid rebuilding the snapshot's contents whenever the current
xactCompletionCount is the same as it was when the snapshot was
originally built.  Currently this check happens while holding
ProcArrayLock. While it's likely possible to perform the check without
acquiring ProcArrayLock, it seems better to do that separately /
later, some careful analysis is required. Even with the lock this is a
significant win on its own.

On a smaller two socket machine this gains another ~1.03x, on a larger
machine the effect is roughly double (earlier patch version tested
though).  If we were able to safely avoid the lock there'd be another
significant gain on top of that.

Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Reviewed-By: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
As Tom Lane pointed out, it could defeat the compiler's printf() format
string verification.

Backpatch to v12, like that patch that introduced it.

Discussion: https://www.postgresql.org/message-id/1069283.1597672779%40sss.pgh.pa.us
When preparing a transaction xactCompletionCount needs to be
incremented, even though the transaction has not committed
yet. Otherwise the snapshot used within the transaction otherwise can
get reused outside of the prepared transaction. As GetSnapshotData()
does not include the current xid when building a snapshot, reuse would
not be correct.

Somewhat surprisingly the regression tests only rarely show incorrect
results without the fix. The reason for that is that often the
snapshot's xmax will be >= the backend xid, yielding a snapshot that
is correct, despite the bug.

I'm working on a reliable test for the bug, but it seems worth seeing
whether this fixes all the BF failures while I do.

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/E1k7tGP-0005V0-5k@gemulon.postgresql.org
This view displays the usages of all the memory contexts of the server
process attached to the current session. This information is useful to
investigate the cause of backend-local memory bloat.

This information can be also collected by calling
MemoryContextStats(TopMemoryContext) via a debugger. But this technique
cannot be uesd in some environments because no debugger is available there.
And it outputs lots of text messages and it's not easy to analyze them.
So, pg_backend_memory_contexts view allows us to access to backend-local
memory contexts information more easily.

Bump catalog version.

Author: Atsushi Torikoshi, Fujii Masao
Reviewed-by: Tatsuhito Kasahara, Andres Freund, Daniel Gustafsson, Robert Haas, Michael Paquier
Discussion: https://postgr.es/m/72a656e0f71d0860161e0b3f67e4d771@oss.nttdata.com
Commit a477bfc fixed eval_const_expressions() to ensure that it
didn't generate unnecessary RelabelType nodes, but I failed to notice
that some other places in the planner had the same issue.  Really
noplace in the planner should be using plain makeRelabelType(), for
fear of generating expressions that should be equal() to semantically
equivalent trees, but aren't.

An example is that because canonicalize_ec_expression() failed
to be careful about this, we could end up with an equivalence class
containing both a plain Const, and a Const-with-RelabelType
representing exactly the same value.  So far as I can tell this led to
no visible misbehavior, but we did waste a bunch of cycles generating
and evaluating "Const = Const-with-RelabelType" to prove such entries
are redundant.

Hence, move the support function added by a477bfc to where it can
be more generally useful, and use it in the places where planner code
previously used makeRelabelType.

Back-patch to v12, like the previous patch.  While I have no concrete
evidence of any real misbehavior here, it's certainly possible that
I overlooked a case where equivalent expressions that aren't equal()
could cause a user-visible problem.  In any case carrying extra
RelabelType nodes through planning to execution isn't very desirable.

Discussion: https://postgr.es/m/1311836.1597781384@sss.pgh.pa.us
This corrects an oversight by me in 2072932, which made
ProcArrayClearTransaction() increment xactCompletionCount. That requires an
exclusive lock, obviously.

There's other approaches that avoid the exclusive acquisition, but given that a
2PC commit is fairly heavyweight, it doesn't seem worth doing so. I've not been
able to measure a performance difference, unsurprisingly.  I did add a
comment documenting that we could do so, should it ever become a bottleneck.

Reported-By: Tom Lane <tgl@sss.pgh.pa.us>
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/1355915.1597794204@sss.pgh.pa.us
When the leftover invalid index is "ccold", there's no need to re-run
the command.  Reword the instructions to make that explicit.

Backpatch to 12, where REINDEX CONCURRENTLY appeared.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://postgr.es/m/20200819211312.GA15497@alvherre.pgsql
This reverts commit 49967da.

Enough time has passed that we can be confident that 07f32fc
resolved the issue. Therefore we can remove the temporary debugging
aids.

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/E1k7tGP-0005V0-5k@gemulon.postgresql.org
Reviewed-by: Abhijit Menon-Sen
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/CAApHDvobgmCs6CohqhKTUf7D8vffoZXQTCBTERo9gbOeZmvLTw%40mail.gmail.com
Backpatch-through: 11, where JIT was added
Commit ce77abe allowed EXPLAIN (BUFFERS) to report the information
on buffer usage during planning phase. However three issues were
reported regarding this feature.

(1) Previously, EXPLAIN option BUFFERS required ANALYZE. So the query
    had to be actually executed by specifying ANALYZE even when we
    want to see only the planner's buffer usage. This was inconvenient
    especially when the query was write one like DELETE.

(2) EXPLAIN included the planner's buffer usage in summary
    information. So SUMMARY option had to be enabled to report that.
    Also this format was confusing.

(3) The output structure for planning information was not consistent
    between TEXT format and the others. For example, "Planning" tag
    was output in JSON format, but not in TEXT format.

For (1), this commit allows us to perform EXPLAIN (BUFFERS) without
ANALYZE to report the planner's buffer usage.

For (2), this commit changed EXPLAIN output so that the planner's
buffer usage is reported before summary information.

For (3), this commit made the output structure for planning
information more consistent between the formats.

Back-patch to v13 where the planner's buffer usage was allowed to
be reported in EXPLAIN.

Reported-by: Pierre Giraud, David Rowley
Author: Fujii Masao
Reviewed-by: David Rowley, Julien Rouhaud, Pierre Giraud
Discussion: https://postgr.es/m/07b226e6-fa49-687f-b110-b7c37572f69e@dalibo.com
Commit 9d701e6 caused the regression test for EXPLAIN to fail on
the buildfarm member prion. This happened because of instability of
test output, i.e., in text format, whether "Planning:" line is output
varies depending on the system state.

This commit updated the regression test so that it ignores that
"Planning:" line to produce more stable test output and get rid of
the test failure.

Back-patch to v13.

Author: Fujii Masao
Discussion: https://postgr.es/m/1803897.1598021621@sss.pgh.pa.us
@melanieplageman melanieplageman force-pushed the ahj_pr branch 2 times, most recently from 5a10d7d to d6c2806 Compare August 29, 2020 00:39
michaelpq and others added 8 commits August 30, 2020 14:14
A failure when dropping concurrently an index used in a replica identity
could leave in pg_index an index marked as !indisvalid and
indisreplident.  Reindexing this index would switch back indisvalid to
true, and if the replica identity of the parent relation was switched to
use a different index, it would be possible to finish with more than one
index marked as indisreplident.  If that were to happen, this could mess
up with the relation cache as an incorrect index could be used for the
replica identity.

Indexes marked as invalid are discarded as candidates for the replica
identity, as of RelationGetIndexList(), so similarly to what is done
with indisclustered, resetting indisreplident when the index is marked
as invalid keeps things consistent.  REINDEX CONCURRENTLY's swapping
already resets the flag for the old index, while the new index inherits
the value of the old index to-be-dropped, so only DROP INDEX was an
issue.

Even if this is a bug, the sequence able to reproduce a problem requires
a failure while running DROP INDEX CONCURRENTLY, something unlikely
going to happen in the field, so no backpatch is done.

Author: Michael Paquier
Reviewed-by: Dmitry Dolgov
Discussion: https://postgr.es/m/20200827025721.GN2017@paquier.xyz
Historically, we've considered the state with relpages and reltuples
both zero as indicating that we do not know the table's tuple density.
This is problematic because it's impossible to distinguish "never yet
vacuumed" from "vacuumed and seen to be empty".  In particular, a user
cannot use VACUUM or ANALYZE to override the planner's normal heuristic
that an empty table should not be believed to be empty because it is
probably about to get populated.  That heuristic is a good safety
measure, so I don't care to abandon it, but there should be a way to
override it if the table is indeed intended to stay empty.

Hence, represent the initial state of ignorance by setting reltuples
to -1 (relpages is still set to zero), and apply the minimum-ten-pages
heuristic only when reltuples is still -1.  If the table is empty,
VACUUM or ANALYZE (but not CREATE INDEX) will override that to
reltuples = relpages = 0, and then we'll plan on that basis.

This requires a bunch of fiddly little changes, but we can get rid of
some ugly kluges that were formerly needed to maintain the old definition.

One notable point is that FDWs' GetForeignRelSize methods will see
baserel->tuples = -1 when no ANALYZE has been done on the foreign table.
That seems like a net improvement, since those methods were formerly
also in the dark about what baserel->tuples = 0 really meant.  Still,
it is an API change.

I bumped catversion because code predating this change would get confused
by seeing reltuples = -1.

Discussion: https://postgr.es/m/F02298E0-6EF4-49A1-BCB6-C484794D9ACC@thebuild.com
…ted.

Per discussion, we're planning to remove parser support for postfix
operators in order to simplify the grammar.  So it behooves us to
put out a deprecation notice at least one release before that.

There is only one built-in postfix operator, ! for factorial.
Label it deprecated in the docs and in pg_description, and adjust
some examples that formerly relied on it.  (The sister prefix
operator !! is also deprecated.  We don't really have to remove
that one, but since we're suggesting that people use factorial()
instead, it seems better to remove both operators.)

Also state in the CREATE OPERATOR ref page that postfix operators
in general are going away.

Although this changes the initial contents of pg_description,
I did not force a catversion bump; it doesn't seem essential.

In v13, also back-patch 4c5cf54, so that there's someplace for
the <link>s to point to.

Mark Dilger and John Naylor, with some adjustments by me

Discussion: https://postgr.es/m/BE2DF53D-251A-4E26-972F-930E523580E9@enterprisedb.com
It has always (since the first commit) worked with relative paths, so
use the same wording as other parts of the documentation.

Author: Bruce Momjian
Discussion: https://postgr.es/m/CABUevExx-hm=cit+A9LeKBH39srvk8Y2tEZeEAj5mP8YfzNKUg@mail.gmail.com
This was not clearly documented when procedures were added in PG 11.

Reported-by: Robin Abbi

Discussion: https://postgr.es/m/CAGmg_NX327KKVuJmbWZD=pGutYFxzZjX1rU+3ji8UuX=8ONn9Q@mail.gmail.com

Backpatch-through: 11
Adds constraints and improves wording.

Reported-by: 2552891@gmail.com

Discussion: https://postgr.es/m/159586122762.680.1361378513036616007@wrigleys.postgresql.org

Backpatch-through: 9.5
It wasn't clear the non-integers are cast to integers for subscripting,
rather than throwing an error.

Reported-by: sean@materialize.io

Discussion: https://postgr.es/m/159538675800.624.7728794628229799531@wrigleys.postgresql.org

Backpatch-through: 9.5
If the inner side tuples of a hashjoin will not fit in memory, the
hashjoin can be executed in multiple batches. If the statistics on the
inner side relation are accurate, planner chooses a multi-batch
strategy and sets the number of batches.
The query executor measures the real size of the hashtable and increases
the number of batches if the hashtable grows too large.

The number of batches is always a power of two, so an increase in the
number of batches doubles it.

Serial hashjoin measures batch size lazily -- waiting until it is
loading a batch to determine if it will fit in memory.

Parallel hashjoin, on the other hand, completes all changes to the
number of batches during the build phase. If it doubles the number of
batches, it dumps all the tuples out, reassigns them to batches,
measures each batch, and checks that it will fit in the space allowed.

In both cases, the executor currently makes a best effort. If a
particular batch won't fit in memory, and, upon changing the number of
batches none of the tuples move to a new batch, the executor disables
growth in the number of batches globally. After growth is disabled, all
batches that would have previously triggered an increase in the number
of batches instead exceed the space allowed.

There is no mechanism to perform a hashjoin within memory constraints if
a run of tuples hash to the same batch. Also, hashjoin will continue to
double the number of batches if *some* tuples move each time -- even if
the batch will never fit in memory -- resulting in an explosion in the
number of batches (affecting performance negatively for multiple
reasons).

Adaptive hashjoin is a mechanism to process a run of inner side tuples
with join keys which hash to the same batch in a manner that is
efficient and respects the space allowed.

When an offending batch causes the number of batches to be doubled and
some percentage of the tuples would not move to a new batch, that batch
can be marked to "fall back". This mechanism replaces serial hashjoin's
"grow_enabled" flag and replaces part of the functionality of parallel
hashjoin's "growth = PHJ_GROWTH_DISABLED" flag. However, instead of
disabling growth in the number of batches for all batches, it only
prevents this batch from causing another increase in the number of
batches.

When the inner side of this batch is loaded into memory, stripes of
arbitrary tuples totaling work_mem in size are loaded into the
hashtable. After probing this stripe, the outer side batch is rewound
and the next stripe is loaded. Each stripe of inner is probed until all
tuples have been processed.

Tuples that match are emitted (depending on the join semantics of the
particular join type) during probing of a stripe. In order to make
left outer join work, unmatched tuples cannot be emitted NULL-extended
until all stripes have been probed. To address this, a bitmap is created
with a bit for each tuple of the outer side. If a tuple on the outer
side matches a tuple from the inner, the corresponding bit is set. At
the end of probing all stripes, the executor scans the bitmap and emits
unmatched outer tuples.

Batch 0 falls back for serial hashjoin but does not yet fall back for
parallel hashjoin. David Kimura is working on a separate patch for this.

TODOs:
- Better solution to deadlock hazard with waiting on a barrier after
  emitting tuples
- Experiment with different fallback threshholds
  (currently hardcoded to 80% but parameterizable)
- Improve stripe instrumentation implementation for serial and parallel
- Assorted TODOs in the code

Co-authored-by: Jesse Zhang <sbjesse@gmail.com>
Co-authored-by: David Kimura <dkimura@pivotal.io>
Spilling of batch 0 for parallel hash join is now implemented.

While working on this, it occurred to us that explicit striping during
the build phase was not required. This patch splits tuples into work_mem
sized stripes while loading them.
This is done exclusively for fallback batches, so normal hashjoins will
not incur the overhead.

There are two major items (that I know about) that still need to be
dealt with:
- Pausing and resuming loading for each stripe
  For now, I added a new STS mode called Append to support the current
  "overflow" design, however, I have other design proposals
- parallel stripe instrumentation needs to change (I have design
  proposals for that).

This commit also comments out a test which inserts tuples larger than
work_mem in size (each), which no longer successfully executes.

Co-authored-by: Soumyadeep Chakraborty <soumyadeep2007@gmail.com>
melanieplageman pushed a commit that referenced this pull request Aug 2, 2021
Due to how pg_size_pretty(bigint) was implemented, it's possible that when
given a negative number of bytes that the returning value would not match
the equivalent positive return value when given the equivalent positive
number of bytes.  This was due to two separate issues.

1. The function used bit shifting to convert the number of bytes into
larger units.  The rounding performed by bit shifting is not the same as
dividing.  For example -3 >> 1 = -2, but -3 / 2 = -1.  These two
operations are only equivalent with positive numbers.

2. The half_rounded() macro rounded towards positive infinity.  This meant
that negative numbers rounded towards zero and positive numbers rounded
away from zero.

Here we fix #1 by dividing the values instead of bit shifting.  We fix postgres#2
by adjusting the half_rounded macro always to round away from zero.

Additionally, adjust the pg_size_pretty(numeric) function to be more
explicit that it's using division rather than bit shifting.  A casual
observer might have believed bit shifting was used due to a static
function being named numeric_shift_right.  However, that function was
calculating the divisor from the number of bits and performed division.
Here we make that more clear.  This change is just cosmetic and does not
affect the return value of the numeric version of the function.

Here we also add a set of regression tests both versions of
pg_size_pretty() which test the values directly before and after the
function switches to the next unit.

This bug was introduced in 8a1fab3. Prior to that negative values were
always displayed in bytes.

Author: Dean Rasheed, David Rowley
Discussion: https://postgr.es/m/CAEZATCXnNW4HsmZnxhfezR5FuiGgp+mkY4AzcL5eRGO4fuadWg@mail.gmail.com
Backpatch-through: 9.6, where the bug was introduced.
melanieplageman pushed a commit that referenced this pull request Sep 4, 2021
Currently the pc files use hard coded paths for "includedir" and
"libdir."

Example:

  Cflags: -I/usr/include
  Libs: -L/usr/lib -lpq

This is not very fortunate when cross compiling inside a buildroot,
where the includes and libs are inside a staging directory, because
this introduces host paths into the build:

  checking for pkg-config... /builder/shared-workdir/build/sdk/staging_dir/host/bin/pkg-config
  checking for PostgreSQL libraries via pkg_config... -L/usr/lib <----

This commit addresses this by doing the following two things:

  1. Instead of hard coding the paths in "Cflags" and "Libs"
     "${includedir}" and "${libdir}" are used.  Note: these variables
     can be overriden on the pkg-config command line
     ("--define-variable=libdir=/some/path").

  2. Add the variables "prefix" and "exec_prefix".  If "includedir"
     and/or "libdir" are using these then construct them accordingly.
     This is done because buildroots (for instance OpenWrt) tend to
     rename the real pkg-config and call it indirectly from a script
     that sets "prefix", "exec_prefix" and "bindir", like so:

     pkg-config.real --define-variable=prefix=${STAGING_PREFIX} \
       --define-variable=exec_prefix=${STAGING_PREFIX} \
       --define-variable=bindir=${STAGING_PREFIX}/bin $@

Example #1: user calls ./configure with "--libdir=/some/lib" and
"--includedir=/some/include":

  prefix=/usr/local/pgsql
  exec_prefix=${prefix}
  libdir=/some/lib
  includedir=/some/include

  Name: libpq
  Description: PostgreSQL libpq library
  Url: http://www.postgresql.org/
  Version: 12.1
  Requires:
  Requires.private:
  Cflags: -I${includedir}
  Libs: -L${libdir} -lpq
  Libs.private:  -lcrypt -lm

Example postgres#2: user calls ./configure with no arguments:

  prefix=/usr/local/pgsql
  exec_prefix=${prefix}
  libdir=${exec_prefix}/lib
  includedir=${prefix}/include

  Name: libpq
  Description: PostgreSQL libpq library
  Url: http://www.postgresql.org/
  Version: 12.1
  Requires:
  Requires.private:
  Cflags: -I${includedir}
  Libs: -L${libdir} -lpq
  Libs.private:  -lcrypt -lm

Like this the paths can be forced into the staging directory when
using a buildroot setup:

  checking for pkg-config... /home/sk/tmp/openwrt/staging_dir/host/bin/pkg-config
  checking for PostgreSQL libraries via pkg_config... -L/home/sk/tmp/openwrt/staging_dir/target-mips_24kc_musl/usr/lib

Author: Sebastian Kemper <sebastian_ml@gmx.net>
Co-authored-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/20200305213827.GA25135%40darth.lan
melanieplageman pushed a commit that referenced this pull request Aug 22, 2022
Remove four probes for members of sockaddr_storage.  Keep only the probe
for sockaddr's sa_len, which is enough for our two remaining places that
know about _len fields:

1.  ifaddr.c needs to know if sockaddr has sa_len to understand the
result of ioctl(SIOCGIFCONF).  Only AIX is still using the relevant code
today, but it seems like a good idea to keep it compilable on Linux.

2.  ip.c was testing for presence of ss_len to decide whether to fill in
sun_len in our getaddrinfo_unix() function.  It's just as good to test
for sa_len.  If you have one, you have them all.

(The code in postgres#2 isn't actually needed at all on several OSes I checked
since modern versions ignore sa_len on input to system calls.  Proving
that's the case for all relevant OSes is left for another day, but
wouldn't get rid of that last probe anyway if we still want it for #1.)

Discussion: https://postgr.es/m/CA%2BhUKGJJjF2AqdU_Aug5n2MAc1gr%3DGykNjVBZq%2Bd6Jrcp3Dyvg%40mail.gmail.com
melanieplageman pushed a commit that referenced this pull request Aug 3, 2023
Commit e7cb7ee, which introduced the infrastructure for FDWs and
custom scan providers to replace joins with scans, failed to add support
handling of pseudoconstant quals assigned to replaced joins in
createplan.c, leading to an incorrect plan without a gating Result node
when postgres_fdw replaced a join with such a qual.

To fix, we could add the support by 1) modifying the ForeignPath and
CustomPath structs to store the list of RestrictInfo nodes to apply to
the join, as in JoinPaths, if they represent foreign and custom scans
replacing a join with a scan, and by 2) modifying create_scan_plan() in
createplan.c to use that list in that case, instead of the
baserestrictinfo list, to get pseudoconstant quals assigned to the join;
but #1 would cause an ABI break.  So fix by modifying the infrastructure
to just disallow replacing joins with such quals.

Back-patch to all supported branches.

Reported by Nishant Sharma.  Patch by me, reviewed by Nishant Sharma and
Richard Guo.

Discussion: https://postgr.es/m/CADrsxdbcN1vejBaf8a%2BQhrZY5PXL-04mCd4GDu6qm6FigDZd6Q%40mail.gmail.com
melanieplageman pushed a commit that referenced this pull request Nov 15, 2023
Default privileges are represented as NULL::aclitem[] in catalog ACL
columns, while revoking all privileges leaves an empty aclitem[].
These two cases used to produce identical output in psql meta-commands
like \dp.  Using something like "\pset null '(default)'" as a
workaround for spotting the difference did not work, because null
values were always displayed as empty strings by describe.c's
meta-commands.

This patch improves that with two changes:

1. Print "(none)" for empty privileges so that the user is able to
   distinguish them from default privileges, even without special
   workarounds.

2. Remove the special handling of null values in describe.c,
   so that "\pset null" is honored like everywhere else.
   (This affects all output from these commands, not only ACLs.)

The privileges shown by \dconfig+ and \ddp as well as the column
privileges shown by \dp are not affected by change #1, because the
respective aclitem[] is reset to NULL or deleted from the catalog
instead of leaving an empty array.

Erik Wienhold and Laurenz Albe

Discussion: https://postgr.es/m/1966228777.127452.1694979110595@office.mailbox.org
melanieplageman pushed a commit that referenced this pull request May 14, 2024
Commit e7cb7ee, which introduced the infrastructure for FDWs and
custom scan providers to replace joins with scans, failed to add support
handling of pseudoconstant quals assigned to replaced joins in
createplan.c, leading to an incorrect plan without a gating Result node
when postgres_fdw replaced a join with such a qual.

To fix, we could add the support by 1) modifying the ForeignPath and
CustomPath structs to store the list of RestrictInfo nodes to apply to
the join, as in JoinPaths, if they represent foreign and custom scans
replacing a join with a scan, and by 2) modifying create_scan_plan() in
createplan.c to use that list in that case, instead of the
baserestrictinfo list, to get pseudoconstant quals assigned to the join;
but #1 would cause an ABI break.  So fix by modifying the infrastructure
to just disallow replacing joins with such quals.

Back-patch to all supported branches.

Reported by Nishant Sharma.  Patch by me, reviewed by Nishant Sharma and
Richard Guo.

Discussion: https://postgr.es/m/CADrsxdbcN1vejBaf8a%2BQhrZY5PXL-04mCd4GDu6qm6FigDZd6Q%40mail.gmail.com
melanieplageman pushed a commit that referenced this pull request Jun 25, 2024
Due to how pg_size_pretty(bigint) was implemented, it's possible that when
given a negative number of bytes that the returning value would not match
the equivalent positive return value when given the equivalent positive
number of bytes.  This was due to two separate issues.

1. The function used bit shifting to convert the number of bytes into
larger units.  The rounding performed by bit shifting is not the same as
dividing.  For example -3 >> 1 = -2, but -3 / 2 = -1.  These two
operations are only equivalent with positive numbers.

2. The half_rounded() macro rounded towards positive infinity.  This meant
that negative numbers rounded towards zero and positive numbers rounded
away from zero.

Here we fix #1 by dividing the values instead of bit shifting.  We fix postgres#2
by adjusting the half_rounded macro always to round away from zero.

Additionally, adjust the pg_size_pretty(numeric) function to be more
explicit that it's using division rather than bit shifting.  A casual
observer might have believed bit shifting was used due to a static
function being named numeric_shift_right.  However, that function was
calculating the divisor from the number of bits and performed division.
Here we make that more clear.  This change is just cosmetic and does not
affect the return value of the numeric version of the function.

Here we also add a set of regression tests both versions of
pg_size_pretty() which test the values directly before and after the
function switches to the next unit.

This bug was introduced in 8a1fab3. Prior to that negative values were
always displayed in bytes.

Author: Dean Rasheed, David Rowley
Discussion: https://postgr.es/m/CAEZATCXnNW4HsmZnxhfezR5FuiGgp+mkY4AzcL5eRGO4fuadWg@mail.gmail.com
Backpatch-through: 9.6, where the bug was introduced.
melanieplageman pushed a commit that referenced this pull request Jun 25, 2024
Commit e7cb7ee, which introduced the infrastructure for FDWs and
custom scan providers to replace joins with scans, failed to add support
handling of pseudoconstant quals assigned to replaced joins in
createplan.c, leading to an incorrect plan without a gating Result node
when postgres_fdw replaced a join with such a qual.

To fix, we could add the support by 1) modifying the ForeignPath and
CustomPath structs to store the list of RestrictInfo nodes to apply to
the join, as in JoinPaths, if they represent foreign and custom scans
replacing a join with a scan, and by 2) modifying create_scan_plan() in
createplan.c to use that list in that case, instead of the
baserestrictinfo list, to get pseudoconstant quals assigned to the join;
but #1 would cause an ABI break.  So fix by modifying the infrastructure
to just disallow replacing joins with such quals.

Back-patch to all supported branches.

Reported by Nishant Sharma.  Patch by me, reviewed by Nishant Sharma and
Richard Guo.

Discussion: https://postgr.es/m/CADrsxdbcN1vejBaf8a%2BQhrZY5PXL-04mCd4GDu6qm6FigDZd6Q%40mail.gmail.com
melanieplageman pushed a commit that referenced this pull request Jun 27, 2024
1. TruncateMultiXact() performs the SLRU truncations in a critical
section. Deleting the SLRU segments calls ForwardSyncRequest(), which
will try to compact the request queue if it's full
(CompactCheckpointerRequestQueue()). That in turn allocates memory,
which is not allowed in a critical section. Backtrace:

    TRAP: failed Assert("CritSectionCount == 0 || (context)->allowInCritSection"), File: "../src/backend/utils/mmgr/mcxt.c", Line: 1353, PID: 920981
    postgres: autovacuum worker template0(ExceptionalCondition+0x6e)[0x560a501e866e]
    postgres: autovacuum worker template0(+0x5dce3d)[0x560a50217e3d]
    postgres: autovacuum worker template0(ForwardSyncRequest+0x8e)[0x560a4ffec95e]
    postgres: autovacuum worker template0(RegisterSyncRequest+0x2b)[0x560a50091eeb]
    postgres: autovacuum worker template0(+0x187b0a)[0x560a4fdc2b0a]
    postgres: autovacuum worker template0(SlruDeleteSegment+0x101)[0x560a4fdc2ab1]
    postgres: autovacuum worker template0(TruncateMultiXact+0x2fb)[0x560a4fdbde1b]
    postgres: autovacuum worker template0(vac_update_datfrozenxid+0x4b3)[0x560a4febd2f3]
    postgres: autovacuum worker template0(+0x3adf66)[0x560a4ffe8f66]
    postgres: autovacuum worker template0(AutoVacWorkerMain+0x3ed)[0x560a4ffe7c2d]
    postgres: autovacuum worker template0(+0x3b1ead)[0x560a4ffecead]
    postgres: autovacuum worker template0(+0x3b620e)[0x560a4fff120e]
    postgres: autovacuum worker template0(+0x3b3fbb)[0x560a4ffeefbb]
    postgres: autovacuum worker template0(+0x2f724e)[0x560a4ff3224e]
    /lib/x86_64-linux-gnu/libc.so.6(+0x27c8a)[0x7f62cc642c8a]
    /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f62cc642d45]
    postgres: autovacuum worker template0(_start+0x21)[0x560a4fd16f31]

To fix, bail out in CompactCheckpointerRequestQueue() without doing
anything, if it's called in a critical section. That covers the above
call path, as well as any other similar cases where
RegisterSyncRequest might be called in a critical section.

2. After fixing that, another problem became apparent: Autovacuum
process doing that truncation can deadlock with the checkpointer
process. TruncateMultiXact() sets "MyProc->delayChkptFlags |=
DELAY_CHKPT_START". If the sync request queue is full and cannot be
compacted, the process will repeatedly sleep and retry, until there is
room in the queue. However, if the checkpointer is trying to start a
checkpoint at the same time, and is waiting for the DELAY_CHKPT_START
processes to finish, the queue will never shrink.

More concretely, the autovacuum process is stuck here:

    #0  0x00007fc934926dc3 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
    #1  0x000056220b24348b in WaitEventSetWaitBlock (set=0x56220c2e4b50, occurred_events=0x7ffe7856d040, nevents=1, cur_timeout=<optimized out>) at ../src/backend/storage/ipc/latch.c:1570
    postgres#2  WaitEventSetWait (set=0x56220c2e4b50, timeout=timeout@entry=10, occurred_events=<optimized out>, occurred_events@entry=0x7ffe7856d040, nevents=nevents@entry=1,
        wait_event_info=wait_event_info@entry=150994949) at ../src/backend/storage/ipc/latch.c:1516
    postgres#3  0x000056220b243224 in WaitLatch (latch=<optimized out>, latch@entry=0x0, wakeEvents=wakeEvents@entry=40, timeout=timeout@entry=10, wait_event_info=wait_event_info@entry=150994949)
        at ../src/backend/storage/ipc/latch.c:538
    postgres#4  0x000056220b26cf46 in RegisterSyncRequest (ftag=ftag@entry=0x7ffe7856d0a0, type=type@entry=SYNC_FORGET_REQUEST, retryOnError=true) at ../src/backend/storage/sync/sync.c:614
    postgres#5  0x000056220af9db0a in SlruInternalDeleteSegment (ctl=ctl@entry=0x56220b7beb60 <MultiXactMemberCtlData>, segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1495
    postgres#6  0x000056220af9dab1 in SlruDeleteSegment (ctl=ctl@entry=0x56220b7beb60 <MultiXactMemberCtlData>, segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1566
    postgres#7  0x000056220af98e1b in PerformMembersTruncation (oldestOffset=<optimized out>, newOldestOffset=<optimized out>) at ../src/backend/access/transam/multixact.c:3006
    postgres#8  TruncateMultiXact (newOldestMulti=newOldestMulti@entry=3221225472, newOldestMultiDB=newOldestMultiDB@entry=4) at ../src/backend/access/transam/multixact.c:3201
    postgres#9  0x000056220b098303 in vac_truncate_clog (frozenXID=749, minMulti=<optimized out>, lastSaneFrozenXid=749, lastSaneMinMulti=3221225472) at ../src/backend/commands/vacuum.c:1917
    postgres#10 vac_update_datfrozenxid () at ../src/backend/commands/vacuum.c:1760
    postgres#11 0x000056220b1c3f76 in do_autovacuum () at ../src/backend/postmaster/autovacuum.c:2550
    postgres#12 0x000056220b1c2c3d in AutoVacWorkerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ../src/backend/postmaster/autovacuum.c:1569

and the checkpointer is stuck here:

    #0  0x00007fc9348ebf93 in clock_nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
    #1  0x00007fc9348fe353 in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
    postgres#2  0x000056220b40ecb4 in pg_usleep (microsec=microsec@entry=10000) at ../src/port/pgsleep.c:50
    postgres#3  0x000056220afb43c3 in CreateCheckPoint (flags=flags@entry=108) at ../src/backend/access/transam/xlog.c:7098
    postgres#4  0x000056220b1c6e86 in CheckpointerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ../src/backend/postmaster/checkpointer.c:464

To fix, add AbsorbSyncRequests() to the loops where the checkpointer
waits for DELAY_CHKPT_START or DELAY_CHKPT_COMPLETE operations to
finish.

Backpatch to v14. Before that, SLRU deletion didn't call
RegisterSyncRequest, which avoided this failure. I'm not sure if there
are other similar scenarios on older versions, but we haven't had
any such reports.

Discussion: https://www.postgresql.org/message-id/ccc66933-31c1-4f6a-bf4b-45fef0d4f22e@iki.fi
melanieplageman pushed a commit that referenced this pull request Jul 8, 2024
1. TruncateMultiXact() performs the SLRU truncations in a critical
section. Deleting the SLRU segments calls ForwardSyncRequest(), which
will try to compact the request queue if it's full
(CompactCheckpointerRequestQueue()). That in turn allocates memory,
which is not allowed in a critical section. Backtrace:

    TRAP: failed Assert("CritSectionCount == 0 || (context)->allowInCritSection"), File: "../src/backend/utils/mmgr/mcxt.c", Line: 1353, PID: 920981
    postgres: autovacuum worker template0(ExceptionalCondition+0x6e)[0x560a501e866e]
    postgres: autovacuum worker template0(+0x5dce3d)[0x560a50217e3d]
    postgres: autovacuum worker template0(ForwardSyncRequest+0x8e)[0x560a4ffec95e]
    postgres: autovacuum worker template0(RegisterSyncRequest+0x2b)[0x560a50091eeb]
    postgres: autovacuum worker template0(+0x187b0a)[0x560a4fdc2b0a]
    postgres: autovacuum worker template0(SlruDeleteSegment+0x101)[0x560a4fdc2ab1]
    postgres: autovacuum worker template0(TruncateMultiXact+0x2fb)[0x560a4fdbde1b]
    postgres: autovacuum worker template0(vac_update_datfrozenxid+0x4b3)[0x560a4febd2f3]
    postgres: autovacuum worker template0(+0x3adf66)[0x560a4ffe8f66]
    postgres: autovacuum worker template0(AutoVacWorkerMain+0x3ed)[0x560a4ffe7c2d]
    postgres: autovacuum worker template0(+0x3b1ead)[0x560a4ffecead]
    postgres: autovacuum worker template0(+0x3b620e)[0x560a4fff120e]
    postgres: autovacuum worker template0(+0x3b3fbb)[0x560a4ffeefbb]
    postgres: autovacuum worker template0(+0x2f724e)[0x560a4ff3224e]
    /lib/x86_64-linux-gnu/libc.so.6(+0x27c8a)[0x7f62cc642c8a]
    /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f62cc642d45]
    postgres: autovacuum worker template0(_start+0x21)[0x560a4fd16f31]

To fix, bail out in CompactCheckpointerRequestQueue() without doing
anything, if it's called in a critical section. That covers the above
call path, as well as any other similar cases where
RegisterSyncRequest might be called in a critical section.

2. After fixing that, another problem became apparent: Autovacuum
process doing that truncation can deadlock with the checkpointer
process. TruncateMultiXact() sets "MyProc->delayChkptFlags |=
DELAY_CHKPT_START". If the sync request queue is full and cannot be
compacted, the process will repeatedly sleep and retry, until there is
room in the queue. However, if the checkpointer is trying to start a
checkpoint at the same time, and is waiting for the DELAY_CHKPT_START
processes to finish, the queue will never shrink.

More concretely, the autovacuum process is stuck here:

    #0  0x00007fc934926dc3 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
    #1  0x000056220b24348b in WaitEventSetWaitBlock (set=0x56220c2e4b50, occurred_events=0x7ffe7856d040, nevents=1, cur_timeout=<optimized out>) at ../src/backend/storage/ipc/latch.c:1570
    postgres#2  WaitEventSetWait (set=0x56220c2e4b50, timeout=timeout@entry=10, occurred_events=<optimized out>, occurred_events@entry=0x7ffe7856d040, nevents=nevents@entry=1,
        wait_event_info=wait_event_info@entry=150994949) at ../src/backend/storage/ipc/latch.c:1516
    postgres#3  0x000056220b243224 in WaitLatch (latch=<optimized out>, latch@entry=0x0, wakeEvents=wakeEvents@entry=40, timeout=timeout@entry=10, wait_event_info=wait_event_info@entry=150994949)
        at ../src/backend/storage/ipc/latch.c:538
    postgres#4  0x000056220b26cf46 in RegisterSyncRequest (ftag=ftag@entry=0x7ffe7856d0a0, type=type@entry=SYNC_FORGET_REQUEST, retryOnError=true) at ../src/backend/storage/sync/sync.c:614
    postgres#5  0x000056220af9db0a in SlruInternalDeleteSegment (ctl=ctl@entry=0x56220b7beb60 <MultiXactMemberCtlData>, segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1495
    postgres#6  0x000056220af9dab1 in SlruDeleteSegment (ctl=ctl@entry=0x56220b7beb60 <MultiXactMemberCtlData>, segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1566
    postgres#7  0x000056220af98e1b in PerformMembersTruncation (oldestOffset=<optimized out>, newOldestOffset=<optimized out>) at ../src/backend/access/transam/multixact.c:3006
    postgres#8  TruncateMultiXact (newOldestMulti=newOldestMulti@entry=3221225472, newOldestMultiDB=newOldestMultiDB@entry=4) at ../src/backend/access/transam/multixact.c:3201
    postgres#9  0x000056220b098303 in vac_truncate_clog (frozenXID=749, minMulti=<optimized out>, lastSaneFrozenXid=749, lastSaneMinMulti=3221225472) at ../src/backend/commands/vacuum.c:1917
    postgres#10 vac_update_datfrozenxid () at ../src/backend/commands/vacuum.c:1760
    postgres#11 0x000056220b1c3f76 in do_autovacuum () at ../src/backend/postmaster/autovacuum.c:2550
    postgres#12 0x000056220b1c2c3d in AutoVacWorkerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ../src/backend/postmaster/autovacuum.c:1569

and the checkpointer is stuck here:

    #0  0x00007fc9348ebf93 in clock_nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
    #1  0x00007fc9348fe353 in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
    postgres#2  0x000056220b40ecb4 in pg_usleep (microsec=microsec@entry=10000) at ../src/port/pgsleep.c:50
    postgres#3  0x000056220afb43c3 in CreateCheckPoint (flags=flags@entry=108) at ../src/backend/access/transam/xlog.c:7098
    postgres#4  0x000056220b1c6e86 in CheckpointerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ../src/backend/postmaster/checkpointer.c:464

To fix, add AbsorbSyncRequests() to the loops where the checkpointer
waits for DELAY_CHKPT_START or DELAY_CHKPT_COMPLETE operations to
finish.

Backpatch to v14. Before that, SLRU deletion didn't call
RegisterSyncRequest, which avoided this failure. I'm not sure if there
are other similar scenarios on older versions, but we haven't had
any such reports.

Discussion: https://www.postgresql.org/message-id/ccc66933-31c1-4f6a-bf4b-45fef0d4f22e@iki.fi
melanieplageman pushed a commit that referenced this pull request Jul 8, 2024
1. TruncateMultiXact() performs the SLRU truncations in a critical
section. Deleting the SLRU segments calls ForwardSyncRequest(), which
will try to compact the request queue if it's full
(CompactCheckpointerRequestQueue()). That in turn allocates memory,
which is not allowed in a critical section. Backtrace:

    TRAP: failed Assert("CritSectionCount == 0 || (context)->allowInCritSection"), File: "../src/backend/utils/mmgr/mcxt.c", Line: 1353, PID: 920981
    postgres: autovacuum worker template0(ExceptionalCondition+0x6e)[0x560a501e866e]
    postgres: autovacuum worker template0(+0x5dce3d)[0x560a50217e3d]
    postgres: autovacuum worker template0(ForwardSyncRequest+0x8e)[0x560a4ffec95e]
    postgres: autovacuum worker template0(RegisterSyncRequest+0x2b)[0x560a50091eeb]
    postgres: autovacuum worker template0(+0x187b0a)[0x560a4fdc2b0a]
    postgres: autovacuum worker template0(SlruDeleteSegment+0x101)[0x560a4fdc2ab1]
    postgres: autovacuum worker template0(TruncateMultiXact+0x2fb)[0x560a4fdbde1b]
    postgres: autovacuum worker template0(vac_update_datfrozenxid+0x4b3)[0x560a4febd2f3]
    postgres: autovacuum worker template0(+0x3adf66)[0x560a4ffe8f66]
    postgres: autovacuum worker template0(AutoVacWorkerMain+0x3ed)[0x560a4ffe7c2d]
    postgres: autovacuum worker template0(+0x3b1ead)[0x560a4ffecead]
    postgres: autovacuum worker template0(+0x3b620e)[0x560a4fff120e]
    postgres: autovacuum worker template0(+0x3b3fbb)[0x560a4ffeefbb]
    postgres: autovacuum worker template0(+0x2f724e)[0x560a4ff3224e]
    /lib/x86_64-linux-gnu/libc.so.6(+0x27c8a)[0x7f62cc642c8a]
    /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f62cc642d45]
    postgres: autovacuum worker template0(_start+0x21)[0x560a4fd16f31]

To fix, bail out in CompactCheckpointerRequestQueue() without doing
anything, if it's called in a critical section. That covers the above
call path, as well as any other similar cases where
RegisterSyncRequest might be called in a critical section.

2. After fixing that, another problem became apparent: Autovacuum
process doing that truncation can deadlock with the checkpointer
process. TruncateMultiXact() sets "MyProc->delayChkptFlags |=
DELAY_CHKPT_START". If the sync request queue is full and cannot be
compacted, the process will repeatedly sleep and retry, until there is
room in the queue. However, if the checkpointer is trying to start a
checkpoint at the same time, and is waiting for the DELAY_CHKPT_START
processes to finish, the queue will never shrink.

More concretely, the autovacuum process is stuck here:

    #0  0x00007fc934926dc3 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
    #1  0x000056220b24348b in WaitEventSetWaitBlock (set=0x56220c2e4b50, occurred_events=0x7ffe7856d040, nevents=1, cur_timeout=<optimized out>) at ../src/backend/storage/ipc/latch.c:1570
    postgres#2  WaitEventSetWait (set=0x56220c2e4b50, timeout=timeout@entry=10, occurred_events=<optimized out>, occurred_events@entry=0x7ffe7856d040, nevents=nevents@entry=1,
        wait_event_info=wait_event_info@entry=150994949) at ../src/backend/storage/ipc/latch.c:1516
    postgres#3  0x000056220b243224 in WaitLatch (latch=<optimized out>, latch@entry=0x0, wakeEvents=wakeEvents@entry=40, timeout=timeout@entry=10, wait_event_info=wait_event_info@entry=150994949)
        at ../src/backend/storage/ipc/latch.c:538
    postgres#4  0x000056220b26cf46 in RegisterSyncRequest (ftag=ftag@entry=0x7ffe7856d0a0, type=type@entry=SYNC_FORGET_REQUEST, retryOnError=true) at ../src/backend/storage/sync/sync.c:614
    postgres#5  0x000056220af9db0a in SlruInternalDeleteSegment (ctl=ctl@entry=0x56220b7beb60 <MultiXactMemberCtlData>, segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1495
    postgres#6  0x000056220af9dab1 in SlruDeleteSegment (ctl=ctl@entry=0x56220b7beb60 <MultiXactMemberCtlData>, segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1566
    postgres#7  0x000056220af98e1b in PerformMembersTruncation (oldestOffset=<optimized out>, newOldestOffset=<optimized out>) at ../src/backend/access/transam/multixact.c:3006
    postgres#8  TruncateMultiXact (newOldestMulti=newOldestMulti@entry=3221225472, newOldestMultiDB=newOldestMultiDB@entry=4) at ../src/backend/access/transam/multixact.c:3201
    postgres#9  0x000056220b098303 in vac_truncate_clog (frozenXID=749, minMulti=<optimized out>, lastSaneFrozenXid=749, lastSaneMinMulti=3221225472) at ../src/backend/commands/vacuum.c:1917
    postgres#10 vac_update_datfrozenxid () at ../src/backend/commands/vacuum.c:1760
    postgres#11 0x000056220b1c3f76 in do_autovacuum () at ../src/backend/postmaster/autovacuum.c:2550
    postgres#12 0x000056220b1c2c3d in AutoVacWorkerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ../src/backend/postmaster/autovacuum.c:1569

and the checkpointer is stuck here:

    #0  0x00007fc9348ebf93 in clock_nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
    #1  0x00007fc9348fe353 in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
    postgres#2  0x000056220b40ecb4 in pg_usleep (microsec=microsec@entry=10000) at ../src/port/pgsleep.c:50
    postgres#3  0x000056220afb43c3 in CreateCheckPoint (flags=flags@entry=108) at ../src/backend/access/transam/xlog.c:7098
    postgres#4  0x000056220b1c6e86 in CheckpointerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ../src/backend/postmaster/checkpointer.c:464

To fix, add AbsorbSyncRequests() to the loops where the checkpointer
waits for DELAY_CHKPT_START or DELAY_CHKPT_COMPLETE operations to
finish.

Backpatch to v14. Before that, SLRU deletion didn't call
RegisterSyncRequest, which avoided this failure. I'm not sure if there
are other similar scenarios on older versions, but we haven't had
any such reports.

Discussion: https://www.postgresql.org/message-id/ccc66933-31c1-4f6a-bf4b-45fef0d4f22e@iki.fi
melanieplageman pushed a commit that referenced this pull request Jul 8, 2024
Commit e7cb7ee, which introduced the infrastructure for FDWs and
custom scan providers to replace joins with scans, failed to add support
handling of pseudoconstant quals assigned to replaced joins in
createplan.c, leading to an incorrect plan without a gating Result node
when postgres_fdw replaced a join with such a qual.

To fix, we could add the support by 1) modifying the ForeignPath and
CustomPath structs to store the list of RestrictInfo nodes to apply to
the join, as in JoinPaths, if they represent foreign and custom scans
replacing a join with a scan, and by 2) modifying create_scan_plan() in
createplan.c to use that list in that case, instead of the
baserestrictinfo list, to get pseudoconstant quals assigned to the join;
but #1 would cause an ABI break.  So fix by modifying the infrastructure
to just disallow replacing joins with such quals.

Back-patch to all supported branches.

Reported by Nishant Sharma.  Patch by me, reviewed by Nishant Sharma and
Richard Guo.

Discussion: https://postgr.es/m/CADrsxdbcN1vejBaf8a%2BQhrZY5PXL-04mCd4GDu6qm6FigDZd6Q%40mail.gmail.com
melanieplageman pushed a commit that referenced this pull request Jul 8, 2024
1. TruncateMultiXact() performs the SLRU truncations in a critical
section. Deleting the SLRU segments calls ForwardSyncRequest(), which
will try to compact the request queue if it's full
(CompactCheckpointerRequestQueue()). That in turn allocates memory,
which is not allowed in a critical section. Backtrace:

    TRAP: failed Assert("CritSectionCount == 0 || (context)->allowInCritSection"), File: "../src/backend/utils/mmgr/mcxt.c", Line: 1353, PID: 920981
    postgres: autovacuum worker template0(ExceptionalCondition+0x6e)[0x560a501e866e]
    postgres: autovacuum worker template0(+0x5dce3d)[0x560a50217e3d]
    postgres: autovacuum worker template0(ForwardSyncRequest+0x8e)[0x560a4ffec95e]
    postgres: autovacuum worker template0(RegisterSyncRequest+0x2b)[0x560a50091eeb]
    postgres: autovacuum worker template0(+0x187b0a)[0x560a4fdc2b0a]
    postgres: autovacuum worker template0(SlruDeleteSegment+0x101)[0x560a4fdc2ab1]
    postgres: autovacuum worker template0(TruncateMultiXact+0x2fb)[0x560a4fdbde1b]
    postgres: autovacuum worker template0(vac_update_datfrozenxid+0x4b3)[0x560a4febd2f3]
    postgres: autovacuum worker template0(+0x3adf66)[0x560a4ffe8f66]
    postgres: autovacuum worker template0(AutoVacWorkerMain+0x3ed)[0x560a4ffe7c2d]
    postgres: autovacuum worker template0(+0x3b1ead)[0x560a4ffecead]
    postgres: autovacuum worker template0(+0x3b620e)[0x560a4fff120e]
    postgres: autovacuum worker template0(+0x3b3fbb)[0x560a4ffeefbb]
    postgres: autovacuum worker template0(+0x2f724e)[0x560a4ff3224e]
    /lib/x86_64-linux-gnu/libc.so.6(+0x27c8a)[0x7f62cc642c8a]
    /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f62cc642d45]
    postgres: autovacuum worker template0(_start+0x21)[0x560a4fd16f31]

To fix, bail out in CompactCheckpointerRequestQueue() without doing
anything, if it's called in a critical section. That covers the above
call path, as well as any other similar cases where
RegisterSyncRequest might be called in a critical section.

2. After fixing that, another problem became apparent: Autovacuum
process doing that truncation can deadlock with the checkpointer
process. TruncateMultiXact() sets "MyProc->delayChkptFlags |=
DELAY_CHKPT_START". If the sync request queue is full and cannot be
compacted, the process will repeatedly sleep and retry, until there is
room in the queue. However, if the checkpointer is trying to start a
checkpoint at the same time, and is waiting for the DELAY_CHKPT_START
processes to finish, the queue will never shrink.

More concretely, the autovacuum process is stuck here:

    #0  0x00007fc934926dc3 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
    #1  0x000056220b24348b in WaitEventSetWaitBlock (set=0x56220c2e4b50, occurred_events=0x7ffe7856d040, nevents=1, cur_timeout=<optimized out>) at ../src/backend/storage/ipc/latch.c:1570
    postgres#2  WaitEventSetWait (set=0x56220c2e4b50, timeout=timeout@entry=10, occurred_events=<optimized out>, occurred_events@entry=0x7ffe7856d040, nevents=nevents@entry=1,
        wait_event_info=wait_event_info@entry=150994949) at ../src/backend/storage/ipc/latch.c:1516
    postgres#3  0x000056220b243224 in WaitLatch (latch=<optimized out>, latch@entry=0x0, wakeEvents=wakeEvents@entry=40, timeout=timeout@entry=10, wait_event_info=wait_event_info@entry=150994949)
        at ../src/backend/storage/ipc/latch.c:538
    postgres#4  0x000056220b26cf46 in RegisterSyncRequest (ftag=ftag@entry=0x7ffe7856d0a0, type=type@entry=SYNC_FORGET_REQUEST, retryOnError=true) at ../src/backend/storage/sync/sync.c:614
    postgres#5  0x000056220af9db0a in SlruInternalDeleteSegment (ctl=ctl@entry=0x56220b7beb60 <MultiXactMemberCtlData>, segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1495
    postgres#6  0x000056220af9dab1 in SlruDeleteSegment (ctl=ctl@entry=0x56220b7beb60 <MultiXactMemberCtlData>, segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1566
    postgres#7  0x000056220af98e1b in PerformMembersTruncation (oldestOffset=<optimized out>, newOldestOffset=<optimized out>) at ../src/backend/access/transam/multixact.c:3006
    postgres#8  TruncateMultiXact (newOldestMulti=newOldestMulti@entry=3221225472, newOldestMultiDB=newOldestMultiDB@entry=4) at ../src/backend/access/transam/multixact.c:3201
    postgres#9  0x000056220b098303 in vac_truncate_clog (frozenXID=749, minMulti=<optimized out>, lastSaneFrozenXid=749, lastSaneMinMulti=3221225472) at ../src/backend/commands/vacuum.c:1917
    postgres#10 vac_update_datfrozenxid () at ../src/backend/commands/vacuum.c:1760
    postgres#11 0x000056220b1c3f76 in do_autovacuum () at ../src/backend/postmaster/autovacuum.c:2550
    postgres#12 0x000056220b1c2c3d in AutoVacWorkerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ../src/backend/postmaster/autovacuum.c:1569

and the checkpointer is stuck here:

    #0  0x00007fc9348ebf93 in clock_nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
    #1  0x00007fc9348fe353 in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
    postgres#2  0x000056220b40ecb4 in pg_usleep (microsec=microsec@entry=10000) at ../src/port/pgsleep.c:50
    postgres#3  0x000056220afb43c3 in CreateCheckPoint (flags=flags@entry=108) at ../src/backend/access/transam/xlog.c:7098
    postgres#4  0x000056220b1c6e86 in CheckpointerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ../src/backend/postmaster/checkpointer.c:464

To fix, add AbsorbSyncRequests() to the loops where the checkpointer
waits for DELAY_CHKPT_START or DELAY_CHKPT_COMPLETE operations to
finish.

Backpatch to v14. Before that, SLRU deletion didn't call
RegisterSyncRequest, which avoided this failure. I'm not sure if there
are other similar scenarios on older versions, but we haven't had
any such reports.

Discussion: https://www.postgresql.org/message-id/ccc66933-31c1-4f6a-bf4b-45fef0d4f22e@iki.fi
melanieplageman pushed a commit that referenced this pull request Jul 29, 2025
There've been a few complaints that it can be overly difficult to figure
out why the planner picked a Memoize plan.  To help address that, here we
adjust the EXPLAIN output to display the following additional details:

1) The estimated number of cache entries that can be stored at once
2) The estimated number of unique lookup keys that we expect to see
3) The number of lookups we expect
4) The estimated hit ratio

Technically postgres#4 can be calculated using #1, postgres#2 and postgres#3, but it's not a
particularly obvious calculation, so we opt to display it explicitly.
The original patch by Lukas Fittl only displayed the hit ratio, but
there was a fear that might lead to more questions about how that was
calculated.  The idea with displaying all 4 is to be transparent which
may allow queries to be tuned more easily.  For example, if postgres#2 isn't
correct then maybe extended statistics or a manual n_distinct estimate can
be used to help fix poor plan choices.

Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>
Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CAP53Pky29GWAVVk3oBgKBDqhND0BRBN6yTPeguV_qSivFL5N_g%40mail.gmail.com
melanieplageman pushed a commit that referenced this pull request Oct 9, 2025
truncate_useless_pathkeys() seems to have neglected to account for
PathKeys that might be useful for WindowClause evaluation.  Modify it so
that it properly accounts for that.

Making this work required adjusting two things:

1. Change from checking query_pathkeys to check sort_pathkeys instead.
2. Add explicit check for window_pathkeys

For #1, query_pathkeys gets set in standard_qp_callback() according to the
sort order requirements for the first operation to be applied after the
join planner is finished, so this changes depending on which upper
planner operations a particular query needs.  If the query has window
functions and no GROUP BY, then query_pathkeys gets set to
window_pathkeys.  Before this change, this meant PathKeys useful for the
ORDER BY were not accounted for in queries with window functions.

Because of #1, postgres#2 is now required so that we explicitly check to ensure
we don't truncate away PathKeys useful for window functions.

Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvrj3HTKmXoLMbUjTO=_MNMxM=cnuCSyBKidAVibmYPnrg@mail.gmail.com
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.