sql: Performance problems with queries to pg_catalog #57249

gyrter · 2020-11-30T20:31:06Z

Describe the problem

Hello. I migrated from CockroachDB 20.1.8 up to 20.2.2 and got problems with application start. Our ORM, PostgreSQL JDBC, creates on start queries to pg_catalog to determine types and relations. And this queries become very slow. We continued investigation and found, that responce time correlated with number of tables in DB. Also, I found that one node in cluster works well.

To Reproduce

Set up CockroachDB cluster in k8s.
Create database with 10 tables
Got this responce time I201130 10:52:27.993894 5920071 sql/exec_log.go:225 ⋮ [n3,client=‹10.111.7.3:38470›,hostssl,user=‹db1›] 3 112.396ms ‹exec› ‹"PostgreSQL JDBC Driver"› ‹{}› ‹"SELECT typinput = 'array_in'::REGPROC AS is_array, typtype, typname FROM pg_catalog.pg_type LEFT JOIN (SELECT ns.oid AS nspoid, ns.nspname, r.r FROM pg_namespace AS ns JOIN (SELECT s.r, (current_schemas(false))[s.r] AS nspname FROM ROWS FROM (generate_series(1, array_upper(current_schemas(false), 1))) AS s (r)) AS r USING (nspname)) AS sp ON sp.nspoid = typnamespace WHERE typname = $1 ORDER BY sp.r, pg_type.oid DESC"› ‹{$1:"'jsonb'"}› 1 ‹""› 0 ‹{ LATENCY_THRESHOLD }›
Add other database with 93 tables
Got responce time I201130 10:36:00.786376 5085793 sql/exec_log.go:225 ⋮ [n2,client=‹192.168.114.18:21850›,hostssl,user=‹db1›] 67 520.064ms ‹exec› ‹"PostgreSQL JDBC Driver"› ‹{}› ‹"SELECT typinput = 'array_in'::REGPROC AS is_array, typtype, typname FROM pg_catalog.pg_type LEFT JOIN (SELECT ns.oid AS nspoid, ns.nspname, r.r FROM pg_namespace AS ns JOIN (SELECT s.r, (current_schemas(false))[s.r] AS nspname FROM ROWS FROM (generate_series(1, array_upper(current_schemas(false), 1))) AS s (r)) AS r USING (nspname)) AS sp ON sp.nspoid = typnamespace WHERE typname = $1 ORDER BY sp.r, pg_type.oid DESC"› ‹{$1:"'jsonb'"}› 1 ‹""› 0 ‹{ LATENCY_THRESHOLD }›
Also we created more tables and query became slower.

Expected behavior

I expected better query time for catalog queries.

Environment:

CockroachDB 20.2.2
Kubernetes 1.16.15
Ubuntu 18.04.5
Linux 5.3.0-51-generic

The text was updated successfully, but these errors were encountered:

blathers-crl · 2020-11-30T20:31:10Z

Hello, I am Blathers. I am here to help you get the issue triaged.

Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.

I have CC'd a few people who may be able to assist you:

@rafiss (found keywords: ORM,ORM, found keywords: JOIN,pg_)
@solongordon (found keywords: JOIN,pg_)
@joshimhoff (found keywords: Kubernetes)

If we have not gotten back to your issue within a few business days, you can try the following:

Join our community slack channel and ask on #cockroachdb.
Try find someone from here if you know they worked closely on the area and CC them.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

rafiss · 2020-11-30T23:39:58Z

Thanks for the report!

I haven't been able to reproduce this locally. I created a local single-node v20.2.2 cluster with 10 tables, then added 100 more tables in a different database. In both cases, the query took under 20ms.

root@:26257/defaultdb> SELECT typinput = 'array_in'::REGPROC AS is_array, typtype, typname FROM pg_catalog.pg_type LEFT JOIN (SELECT ns.oid AS nspoid, ns.nspname, r.r FROM pg_namespace AS ns JOIN (SELECT s.r, (current_schemas(false))[s.r] AS nspname FROM ROWS FROM (generate_series(1, array_upper(current_schemas(false), 1))) AS s (r)) AS r USING (nspname)) AS sp ON sp.nspoid = typnamespace WHERE typname = 'jsonb' ORDER BY sp.r, pg_type.oid DESC;
  is_array | typtype | typname
-----------+---------+----------
   false   | b       | jsonb
(1 row)

Time: 17ms total (execution 17ms / network 0ms)

A couple questions for you @gyrter

Can you confirm that you saw this with v20.2.2? versions 20.2.0 and 20.2.1 did have an issue with performance of pg_catalog queries, but PR release-20.2: opt: add cost for table descriptor fetch during virtual scan #56349 made it into v20.2.2 and addressed that issue. So I just want to confirm that you are running a version that includes that PR.
I've only tried running the query from the command line. Can you reproduce the problem by running the problem from the command line? Or does it only happen when running from the Java app, which I see also uses a prepared statement.
Can you run the following and share the output?

EXPLAIN SELECT typinput = 'array_in'::REGPROC AS is_array, typtype, typname FROM pg_catalog.pg_type LEFT JOIN (SELECT ns.oid AS nspoid, ns.nspname, r.r FROM pg_namespace AS ns JOIN (SELECT s.r, (current_schemas(false))[s.r] AS nspname FROM ROWS FROM (generate_series(1, array_upper(current_schemas(false), 1))) AS s (r)) AS r USING (nspname)) AS sp ON sp.nspoid = typnamespace WHERE typname = 'jsonb' ORDER BY sp.r, pg_type.oid DESC;

gyrter · 2020-12-01T07:15:42Z

Thank you for fast responce. I will try to help you with this issue.

I tested JDBC on 20.2.0, 20.2.1 and 20.2.2 - all versions was affected.
Yes, of cause. I found more interesting query
SELECT tmp.table_cat, tmp.table_schem, tmp.table_name, tmp.non_unique, tmp.index_qualifier, tmp.index_name, tmp.type, tmp.ordinal_position, btrim(pg_catalog.pg_get_indexdef(tmp.ci_oid, tmp.ordinal_position, false), '\"') AS column_name, CASE tmp.am_canorder WHEN true THEN CASE tmp.i_indoption[tmp.ordinal_position - 1] & 1 WHEN 1 THEN 'D' ELSE 'A' END ELSE NULL END AS asc_or_desc, tmp.cardinality, tmp.pages, tmp.filter_condition FROM (SELECT NULL AS table_cat, n.nspname AS table_schem, ct.relname AS table_name, NOT i.indisunique AS non_unique, NULL AS index_qualifier, ci.relname AS index_name, CASE i.indisclustered WHEN true THEN 1 ELSE CASE am.amname WHEN 'hash' THEN 2 ELSE 3 END END AS type, (information_schema._pg_expandarray(i.indkey)).n AS ordinal_position, ci.reltuples AS cardinality, ci.relpages AS pages, pg_catalog.pg_get_expr(i.indpred, i.indrelid) AS filter_condition, ci.oid AS ci_oid, i.indoption AS i_indoption, am.amcanorder AS am_canorder FROM pg_catalog.pg_class AS ct JOIN pg_catalog.pg_namespace AS n ON (ct.relnamespace = n.oid) JOIN pg_catalog.pg_index AS i ON (ct.oid = i.indrelid) JOIN pg_catalog.pg_class AS ci ON (ci.oid = i.indexrelid) JOIN pg_catalog.pg_am AS am ON (ci.relam = am.oid) WHERE (true AND (n.nspname = 'public')) AND (ct.relname = 'sometable')) AS tmp ORDER BY non_unique, type, index_name, ordinal_position.
Here is my tests:

~ # time psql -U admin -h v20_2_2 -d db1 -p26257 -f problem_query.sql
...
real	0m9.395s
user	0m0.056s
sys	0m0.009s
~ # time psql -U admin -h v20_1_8  -d db1 -p26257 -f problem_query.sql
...
real	0m1.437s
user	0m0.053s
sys	0m0.014s

Also, in different versions of CockroachDB this query has different query plans:

v20.2.2

                            tree                            |    field     |                 description                  
------------------------------------------------------------+--------------+----------------------------------------------
                                                            | distribution | local
                                                            | vectorized   | false
 sort                                                       |              | 
  │                                                         | order        | +non_unique,+type,+relname,+ordinal_position
  └── render                                                |              | 
       └── render                                           |              | 
            └── project set                                 |              | 
                 └── hash join                              |              | 
                      │                                     | equality     | (relam) = (oid)
                      ├── hash join                         |              | 
                      │    │                                | equality     | (oid) = (indexrelid)
                      │    ├── virtual table                |              | 
                      │    │                                | table        | pg_class@primary
                      │    └── hash join                    |              | 
                      │         │                           | equality     | (indrelid) = (oid)
                      │         ├── virtual table           |              | 
                      │         │                           | table        | pg_index@primary
                      │         └── hash join               |              | 
                      │              │                      | equality     | (relnamespace) = (oid)
                      │              ├── filter             |              | 
                      │              │    │                 | filter       | relname = 'sometable'
                      │              │    └── virtual table |              | 
                      │              │                      | table        | pg_class@primary
                      │              └── filter             |              | 
                      │                   │                 | filter       | nspname = 'public'
                      │                   └── virtual table |              | 
                      │                                     | table        | pg_namespace@primary
                      └── virtual table                     |              | 
                                                            | table        | pg_am@primary

v20.1.8

                              tree                               |    field    |                 description                  
-----------------------------------------------------------------+-------------+----------------------------------------------
                                                                 | distributed | false
                                                                 | vectorized  | false
 render                                                          |             | 
  └── sort                                                       |             | 
       │                                                         | order       | +non_unique,+type,+relname,+ordinal_position
       └── render                                                |             | 
            └── render                                           |             | 
                 └── project set                                 |             | 
                      └── hash-join                              |             | 
                           │                                     | type        | inner
                           │                                     | equality    | (oid) = (relam)
                           ├── virtual table                     |             | 
                           │                                     | source      | 
                           └── hash-join                         |             | 
                                │                                | type        | inner
                                │                                | equality    | (oid) = (indexrelid)
                                ├── virtual table                |             | 
                                │                                | source      | 
                                └── hash-join                    |             | 
                                     │                           | type        | inner
                                     │                           | equality    | (indrelid) = (oid)
                                     ├── virtual table           |             | 
                                     │                           | source      | 
                                     └── hash-join               |             | 
                                          │                      | type        | inner
                                          │                      | equality    | (relnamespace) = (oid)
                                          ├── filter             |             | 
                                          │    │                 | filter      | relname = 'sometable'
                                          │    └── virtual table |             | 
                                          │                      | source      | 
                                          └── filter             |             | 
                                               │                 | filter      | nspname = 'public'
                                               └── virtual table |             | 
                                                                 | source      |

Main difference - query has join on pg_am.

Output for EXPLAIN SELECT typinput = 'array_in'::REGPROC AS is_array, typtype, typname FROM pg_catalog.pg_type LEFT JOIN (SELECT ns.oid AS nspoid, ns.nspname, r.r FROM pg_namespace AS ns JOIN (SELECT s.r, (current_schemas(false))[s.r] AS nspname FROM ROWS FROM (generate_series(1, array_upper(current_schemas(false), 1))) AS s (r)) AS r USING (nspname)) AS sp ON sp.nspoid = typnamespace WHERE typname = 'jsonb' ORDER BY sp.r, pg_type.oid DESC;

~ # time psql -U admin -h v20_2_2 -d db1 -p26257 -f test_query.sql 
Password for user admin: 
                  tree                  |    field     |      description       
----------------------------------------pg_am+--------------+------------------------
                                        | distribution | local
                                        | vectorized   | false
 sort                                   |              | 
  │                                     | order        | +generate_series,-oid
  └── render                            |              | 
       └── hash join (right outer)      |              | 
            │                           | equality     | (oid) = (typnamespace)
            ├── hash join               |              | 
            │    │                      | equality     | (nspname) = (nspname)
            │    ├── virtual table      |              | 
            │    │                      | table        | pg_namespace@primary
            │    └── render             |              | 
            │         └── project set   |              | 
            │              └── emptyrow |              | 
            └── filter                  |              | 
                 │                      | filter       | typname = 'jsonb'
                 └── virtual table      |              | 
                                        | table        | pg_type@primary
(18 rows)


real	0m5.982s
user	0m0.052s
sys	0m0.016s
 ~ # time psql -U admin -h v20_1_8 -d db1 -p26257 -f test_query.sql 
Password for user admin: 
                    tree                     |    field    |      description       
---------------------------------------------+-------------+------------------------
                                             | distributed | false
                                             | vectorized  | false
 render                                      |             | 
  └── sort                                   |             | 
       │                                     | order       | +generate_series,-oid
       └── render                            |             | 
            └── hash-join                    |             | 
                 │                           | type        | right outer
                 │                           | equality    | (oid) = (typnamespace)
                 ├── hash-join               |             | 
                 │    │                      | type        | inner
                 │    │                      | equality    | (nspname) = (nspname)
                 │    ├── virtual table      |             | 
                 │    │                      | source      | 
                 │    └── render             |             | 
                 │         └── project set   |             | 
                 │              └── emptyrow |             | 
                 └── filter                  |             | 
                      │                      | filter      | typname = 'jsonb'
                      └── virtual table      |             | 
                                             | source      | 
(21 rows)


real	0m0.876s
user	0m0.052s
sys	0m0.018s

All db instances run in same nodes in k8s.

z0mb1ek · 2020-12-01T09:02:57Z

Hi. We use hibernate too, and for schema validating I see this performance issue:

Also I see this in cockroach cloud too. Add issue https://support.cockroachlabs.com/hc/en-us/requests/6880

Add diagnostics details.
stmt-bundle-611918818502377474.zip

rafiss · 2020-12-01T22:47:12Z

I created a remote 3-node cluster running v20.2.2, and added 100 empty tables into it.

Running SELECT tmp.table_cat, tmp.table_schem, tmp.table_name, tmp.non_unique, tmp.index_qualifier, tmp.index_name, tmp.type, tmp.ordinal_position, btrim(pg_catalog.pg_get_indexdef(tmp.ci_oid, tmp.ordinal_position, false), '\"') AS column_name, CASE tmp.am_canorder WHEN true THEN CASE tmp.i_indoption[tmp.ordinal_position - 1] & 1 WHEN 1 THEN 'D' ELSE 'A' END ELSE NULL END AS asc_or_desc, tmp.cardinality, tmp.pages, tmp.filter_condition FROM (SELECT NULL AS table_cat, n.nspname AS table_schem, ct.relname AS table_name, NOT i.indisunique AS non_unique, NULL AS index_qualifier, ci.relname AS index_name, CASE i.indisclustered WHEN true THEN 1 ELSE CASE am.amname WHEN 'hash' THEN 2 ELSE 3 END END AS type, (information_schema._pg_expandarray(i.indkey)).n AS ordinal_position, ci.reltuples AS cardinality, ci.relpages AS pages, pg_catalog.pg_get_expr(i.indpred, i.indrelid) AS filter_condition, ci.oid AS ci_oid, i.indoption AS i_indoption, am.amcanorder AS am_canorder FROM pg_catalog.pg_class AS ct JOIN pg_catalog.pg_namespace AS n ON (ct.relnamespace = n.oid) JOIN pg_catalog.pg_index AS i ON (ct.oid = i.indrelid) JOIN pg_catalog.pg_class AS ci ON (ci.oid = i.indexrelid) JOIN pg_catalog.pg_am AS am ON (ci.relam = am.oid) WHERE (true AND (n.nspname = 'public')) AND (ct.relname = 'sometable')) AS tmp ORDER BY non_unique, type, index_name, ordinal_position

against it still only takes 45ms. So unfortunately I still have not been able to reproduce the slowness.

Thanks for sharing those explain plans. I think the plans actually are the same, though maybe the output format changed. I see both include the join on relam:

                      └── hash-join                              |             | 
                           │                                     | type        | inner
                           │                                     | equality    | (oid) = (relam)

the statement bundle you provided should be helpful. we will look into that next.

z0mb1ek · 2020-12-02T09:59:26Z

@rafiss Hi, steps for reproduce:

create database test1
make IMPORT PGDUMP 'https://drive.google.com/uc?export=download&id=1USsFE_LypdaAMCRNgExaiL5L4wvLmbP4'
create database test2
start java application that creates simple 10 tables (I think it doesn't matter by hand or automatically)
restart java app and see that it starts for 1 minute
see in cockroachdb cluster panel that service latency for this start is 6s
see in cockroachdb cluster panel braking statements

Also I see slow working in TablePlus too

Add more bundles for you

stmt-bundle-612213293907738625.zip
stmt-bundle-612213287988232193.zip
stmt-bundle-612212071640268801.zip
stmt-bundle-612212065837842433.zip
stmt-bundle-612212028882747393.zip
stmt-bundle-612212014046117889.zip
stmt-bundle-612212034723119105.zip
stmt-bundle-612212020206993409.zip

rafiss · 2020-12-03T00:14:13Z

Thanks for these steps. I can reproduce more of the slowness now. I think the existence of more foreign keys makes these queries slower, so using the pgdump you provided was helpful.

I believe the underlying cause is new validation logic in v20.2.x that is performed when fetching table descriptors. We fetch these descriptors every time any data from a pg_catalog table is read.

We'll work on improving the performance and include the improvement in a patch release of 20.2 as soon as we can.

blathers-crl · 2020-12-03T17:08:38Z

Hi @jordanlewis, please add branch-* labels to identify which branch(es) this release-blocker affects.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

In 20.2 we added logic to validate descriptors on the read path for getting all descriptors and introduced a performance regression. In particular, we'd now retrieve all databases and cross references for all tables. This stricter validation has proven to be somewhat handy and has become utilized in some tests to ensure that descriptors indeed are valid. Eliminating this validation would thus be problematic. This is fortunately easy to resolve as we already have all of the descriptors sitting in memory. With this change, we use them. Fixes cockroachdb#57249. In the previously added benchmark tracking the number of KV calls, the code used to perform 32 reads and now perform 1 (the 32 mostly has to do with resolving the system database when validating system tables). Release note (bug fix): Fixed performance regression introduced in v20.2 to reading virtual tables which introspect the schema.

57231: roachtest: skip jobs/mixed-version r=dt a=pbardea This test runs import in a mixed-version setting, which is known to be not supported. This test is failing often enough that we're not getting good signal from it. Fixes #56038. Release note: None 57382: importccl: support IMPORT in mixed-version cluster r=dt a=pbardea This commit adds support for running IMPORT in a mixed-version cluster. Note, that it does not add support for running an IMPORT during the upgrade of a node however. The IMPORT job would need to be restarted in that case. Release note (sql change): Add support for running IMPORT in a mixed-version cluster. 57542: catalogkv: use in-memory descriptors to validate in GetAllDescriptors r=lucy-zhang a=ajwerner In 20.2 we added logic to validate descriptors on the read path for getting all descriptors and introduced a performance regression. In particular, we'd now retrieve all databases and cross references for all tables. This stricter validation has proven to be somewhat handy and has become utilized in some tests to ensure that descriptors indeed are valid. Eliminating this validation would thus be problematic. This is fortunately easy to resolve as we already have all of the descriptors sitting in memory. With this change, we use them. Fixes #57249. In the previously added benchmark tracking the number of KV calls, the code used to perform 32 reads and now perform 1 (the 32 mostly has to do with resolving the system database when validating system tables). Release note (bug fix): Fixed performance regression introduced in v20.2 to reading virtual tables which introspect the schema. Co-authored-by: Paul Bardea <pbardea@gmail.com> Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>

In 20.2 we added logic to validate descriptors on the read path for getting all descriptors and introduced a performance regression. In particular, we'd now retrieve all databases and cross references for all tables. This stricter validation has proven to be somewhat handy and has become utilized in some tests to ensure that descriptors indeed are valid. Eliminating this validation would thus be problematic. This is fortunately easy to resolve as we already have all of the descriptors sitting in memory. With this change, we use them. Fixes cockroachdb#57249. In the previously added benchmark tracking the number of KV calls, the code used to perform 32 reads and now perform 1 (the 32 mostly has to do with resolving the system database when validating system tables). Release note (bug fix): Fixed performance regression introduced in v20.2 to reading virtual tables which introspect the schema.

blathers-crl bot added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community X-blathers-triaged blathers was able to find an owner labels Nov 30, 2020

rafiss added this to Triage in SQL Sessions - Deprecated via automation Nov 30, 2020

z0mb1ek mentioned this issue Dec 1, 2020

20.2 Performance issues when using Postico #57302

Closed

rafiss moved this from Triage to Shorter term backlog in SQL Sessions - Deprecated Dec 1, 2020

jordanlewis assigned ajwerner Dec 3, 2020

jordanlewis added this to Triage in SQL Foundations via automation Dec 3, 2020

jordanlewis removed this from Shorter term backlog in SQL Sessions - Deprecated Dec 3, 2020

jordanlewis moved this from Triage to 21.1 December in SQL Foundations Dec 3, 2020

jordanlewis added the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Dec 3, 2020

jordanlewis added the branch-release-20.2 label Dec 3, 2020

ajwerner mentioned this issue Dec 4, 2020

catalogkv: use in-memory descriptors to validate in GetAllDescriptors #57542

Merged

ajwerner mentioned this issue Dec 4, 2020

release-20.2: catalogkv: use in-memory descriptors to validate in GetAllDescriptors #57574

Merged

craig bot closed this as completed in 4e8a985 Dec 4, 2020

SQL Foundations automation moved this from 21.1 December to Done Dec 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: Performance problems with queries to pg_catalog #57249

sql: Performance problems with queries to pg_catalog #57249

gyrter commented Nov 30, 2020

blathers-crl bot commented Nov 30, 2020

rafiss commented Nov 30, 2020

gyrter commented Dec 1, 2020 •

edited

z0mb1ek commented Dec 1, 2020

rafiss commented Dec 1, 2020 •

edited

z0mb1ek commented Dec 2, 2020 •

edited

rafiss commented Dec 3, 2020

blathers-crl bot commented Dec 3, 2020

sql: Performance problems with queries to pg_catalog #57249

sql: Performance problems with queries to pg_catalog #57249

Comments

gyrter commented Nov 30, 2020

blathers-crl bot commented Nov 30, 2020

rafiss commented Nov 30, 2020

gyrter commented Dec 1, 2020 • edited

z0mb1ek commented Dec 1, 2020

rafiss commented Dec 1, 2020 • edited

z0mb1ek commented Dec 2, 2020 • edited

rafiss commented Dec 3, 2020

blathers-crl bot commented Dec 3, 2020

gyrter commented Dec 1, 2020 •

edited

rafiss commented Dec 1, 2020 •

edited

z0mb1ek commented Dec 2, 2020 •

edited