Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patroni on K8s: Unexpected error from Kubernetes API #1589

Closed
jnatherley opened this issue Jun 14, 2020 · 3 comments
Closed

Patroni on K8s: Unexpected error from Kubernetes API #1589

jnatherley opened this issue Jun 14, 2020 · 3 comments

Comments

@jnatherley
Copy link

jnatherley commented Jun 14, 2020

Hello,

First of all thank you for this package.

I encountered a blip while the leader while the leader was reconciling:

2020-06-14 04:12:51,538 INFO: Lock owner: flytro-postgres-0; I am flytro-postgres-0
2020-06-14 04:12:51,601 ERROR: Unexpected error from Kubernetes API
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 314, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 660, in patch_or_create
    ret = self.retry(func, self._namespace, body) if retry else func(self._namespace, body)
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 481, in retry
    return self._retry.copy()(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/patroni/utils.py", line 331, in __call__
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 303, in wrapper
    return getattr(self._core_v1_api, func)(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 235, in wrapper
    return self._api_client.call_api(method, path, headers, body, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 204, in call_api
    return self._handle_server_response(response, _preload_content)
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 173, in _handle_server_response
    raise k8s_client.rest.ApiException(http_resp=response)
patroni.dcs.kubernetes.K8sClient.rest.ApiException: (500)
Reason: Internal Server Error
HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Sun, 14 Jun 2020 04:12:51 GMT', 'Content-Length': '157'})
HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"rpc error: code = Unknown desc = auth: revision in header is old","code":500}\n'

2020-06-14 04:12:51,603 ERROR: failed to update leader lock
ERROR: 2020/06/14 04:12:51.889625 UploadLabelFiles: failed to stop backup: QueryRunner StopBackup: stop backup failed: EOF
2020-06-14 04:12:52,235 INFO: demoted self because failed to update leader lock in DCS
2020-06-14 04:12:52,241 INFO: closed patroni connection to the postgresql cluster
2020-06-14 04:12:53,011 INFO: postmaster pid=15002
2020-06-14 04:12:53 UTC [15002]: [1-1] 5ee5a3c4.3a9a 0     LOG:  Auto detecting pg_stat_kcache.linux_hz parameter...
2020-06-14 04:12:53 UTC [15002]: [2-1] 5ee5a3c4.3a9a 0     LOG:  pg_stat_kcache.linux_hz is set to 1000000
2020-06-14 04:12:53 UTC [15002]: [3-1] 5ee5a3c4.3a9a 0     LOG:  starting PostgreSQL 12.3 (Ubuntu 12.3-1.pgdg18.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit
2020-06-14 04:12:53 UTC [15002]: [4-1] 5ee5a3c4.3a9a 0     LOG:  listening on IPv4 address "0.0.0.0", port 5432
2020-06-14 04:12:53 UTC [15002]: [5-1] 5ee5a3c4.3a9a 0     LOG:  listening on IPv6 address "::", port 5432
/var/run/postgresql:5432 - no response
2020-06-14 04:12:53 UTC [15002]: [6-1] 5ee5a3c4.3a9a 0     LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2020-06-14 04:12:53 UTC [15002]: [7-1] 5ee5a3c4.3a9a 0     LOG:  redirecting log output to logging collector process
2020-06-14 04:12:53 UTC [15002]: [8-1] 5ee5a3c4.3a9a 0     HINT:  Future log output will appear in directory "../pg_log".
/var/run/postgresql:5432 - rejecting connections
/var/run/postgresql:5432 - rejecting connections
/var/run/postgresql:5432 - rejecting connections
2020-06-14 04:12:55.982 36 LOG {ticks: 0, maint: 0, retry: 0}
/var/run/postgresql:5432 - accepting connections
2020-06-14 04:12:56,789 INFO: Lock owner: flytro-postgres-0; I am flytro-postgres-0
2020-06-14 04:12:56,789 INFO: establishing a new patroni connection to the postgres cluster
2020-06-14 04:12:57,286 INFO: promoted self to leader because i had the session lock
server promoting
2020-06-14 04:12:57,294 INFO: cleared rewind state after becoming the leader
2020-06-14 04:12:59,590 INFO: Lock owner: flytro-postgres-0; I am flytro-postgres-0
2020-06-14 04:13:00,090 INFO: no action.  i am the leader with the lock
DO
DO
DO
NOTICE:  extension "pg_auth_mon" already exists, skipping
CREATE EXTENSION
NOTICE:  version "1.0" of extension "pg_auth_mon" is already installed
ALTER EXTENSION
GRANT
NOTICE:  extension "pg_cron" already exists, skipping
CREATE EXTENSION
NOTICE:  version "1.2" of extension "pg_cron" is already installed
ALTER EXTENSION
ALTER POLICY
REVOKE
GRANT
GRANT
CREATE FUNCTION
REVOKE
GRANT
REVOKE
GRANT
REVOKE
GRANT
GRANT
NOTICE:  extension "file_fdw" already exists, skipping
CREATE EXTENSION
DO
NOTICE:  relation "postgres_log" already exists, skipping
CREATE TABLE
GRANT
NOTICE:  relation "postgres_log_0" already exists, skipping
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
NOTICE:  relation "postgres_log_1" already exists, skipping
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
NOTICE:  relation "postgres_log_2" already exists, skipping
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
NOTICE:  relation "postgres_log_3" already exists, skipping
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
NOTICE:  relation "postgres_log_4" already exists, skipping
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
NOTICE:  relation "postgres_log_5" already exists, skipping
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
NOTICE:  relation "postgres_log_6" already exists, skipping
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
NOTICE:  relation "postgres_log_7" already exists, skipping
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
RESET
SET
NOTICE:  drop cascades to 3 other objects
DETAIL:  drop cascades to type zmon_utils.system_information
drop cascades to function zmon_utils.get_database_cluster_information()
drop cascades to function zmon_utils.get_database_cluster_system_information()
DROP SCHEMA
NOTICE:  extension "plpython3u" already exists, skipping
DO
NOTICE:  language "plpythonu" does not exist, skipping
DROP LANGUAGE
NOTICE:  function plpython_call_handler() does not exist, skipping
DROP FUNCTION
NOTICE:  function plpython_inline_handler(internal) does not exist, skipping
DROP FUNCTION
NOTICE:  function plpython_validator(oid) does not exist, skipping
DROP FUNCTION
CREATE SCHEMA
GRANT
SET
CREATE TYPE
CREATE FUNCTION
CREATE FUNCTION
GRANT
You are now connected to database "postgres" as user "postgres".
NOTICE:  schema "user_management" already exists, skipping
CREATE SCHEMA
GRANT
SET
CREATE FUNCTION
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
GRANT
RESET
NOTICE:  extension "pg_stat_statements" already exists, skipping
CREATE EXTENSION
NOTICE:  extension "pg_stat_kcache" already exists, skipping
CREATE EXTENSION
NOTICE:  extension "set_user" already exists, skipping
CREATE EXTENSION
NOTICE:  version "1.6" of extension "set_user" is already installed
ALTER EXTENSION
GRANT
GRANT
NOTICE:  schema "metric_helpers" already exists, skipping
CREATE SCHEMA
GRANT
GRANT
SET
CREATE FUNCTION
REVOKE
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
CREATE FUNCTION
REVOKE
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
CREATE FUNCTION
REVOKE
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
RESET
You are now connected to database "flytro" as user "postgres".
NOTICE:  schema "user_management" already exists, skipping
CREATE SCHEMA
GRANT
SET
CREATE FUNCTION
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
GRANT
RESET
NOTICE:  extension "pg_stat_statements" already exists, skipping
CREATE EXTENSION
NOTICE:  extension "pg_stat_kcache" already exists, skipping
CREATE EXTENSION
NOTICE:  extension "set_user" already exists, skipping
CREATE EXTENSION
NOTICE:  version "1.6" of extension "set_user" is already installed
ALTER EXTENSION
GRANT
GRANT
NOTICE:  schema "metric_helpers" already exists, skipping
CREATE SCHEMA
GRANT
GRANT
SET
CREATE FUNCTION
REVOKE
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
CREATE FUNCTION
REVOKE
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
CREATE FUNCTION
REVOKE
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
RESET
You are now connected to database "template1" as user "postgres".
NOTICE:  schema "user_management" already exists, skipping
CREATE SCHEMA
GRANT
SET
CREATE FUNCTION
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
GRANT
RESET
NOTICE:  extension "pg_stat_statements" already exists, skipping
CREATE EXTENSION
NOTICE:  extension "pg_stat_kcache" already exists, skipping
CREATE EXTENSION
NOTICE:  extension "set_user" already exists, skipping
CREATE EXTENSION
NOTICE:  version "1.6" of extension "set_user" is already installed
ALTER EXTENSION
GRANT
GRANT
NOTICE:  schema "metric_helpers" already exists, skipping
CREATE SCHEMA
GRANT
GRANT
SET
CREATE FUNCTION
REVOKE
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
CREATE FUNCTION
REVOKE
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
CREATE FUNCTION
REVOKE
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
RESET
2020-06-14 04:13:09,591 INFO: Lock owner: flytro-postgres-0; I am flytro-postgres-0
2020-06-14 04:13:09,990 ERROR: Unexpected error from Kubernetes API
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 314, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 763, in touch_member
    ret = self._api.patch_namespaced_pod(self._name, self._namespace, body)
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 303, in wrapper
    return getattr(self._core_v1_api, func)(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 235, in wrapper
    return self._api_client.call_api(method, path, headers, body, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 204, in call_api
    return self._handle_server_response(response, _preload_content)
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 173, in _handle_server_response
    raise k8s_client.rest.ApiException(http_resp=response)
patroni.dcs.kubernetes.K8sClient.rest.ApiException: (500)
Reason: Internal Server Error
HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Sun, 14 Jun 2020 04:13:09 GMT', 'Content-Length': '157'})
HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"rpc error: code = Unknown desc = auth: revision in header is old","code":500}\n'

2020-06-14 04:13:09,991 INFO: no action.  i am the leader with the lock

Probably the most interesting part is: rpc error: code = Unknown desc = auth: revision in header is old.

Kubernetes (1.18.3)
Postgresql (12.3) TimescaleDB - Community edition (1.7.1)
Spilo image was built from Master with TimescaleDB.

Any recommendations on resolving this issue would be extremely welcome.

@jnatherley
Copy link
Author

#1387 <- Perhaps on the same line as this issue here

@CyberDem0n
Copy link
Collaborator

Looks like Patroni is not handling correctly 500 http status code: https://github.com/zalando/patroni/blob/master/patroni/dcs/kubernetes.py#L78

The K8s documentation suggests that we should retry.
When this code was written in 2017 I was able to find the good documentation about K8s API HTTP status codes that require retrying.

#1387 was a different problem and it was fixed back then.

CyberDem0n pushed a commit that referenced this issue Jun 22, 2020
we should retry on 500, not on 502.
Close #1589
@Hades32
Copy link

Hades32 commented Jul 6, 2020

@CyberDem0n thanks for fixing this, I think I also ran into this! What's your release cycle? I really hoped to try it this fix without rebuilding patroni+spilo myself...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants