Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm defaults not even starting up - [CRITICAL] WORKER TIMEOUT #146

Open
JustinGuese opened this issue Jun 7, 2023 · 12 comments
Open

Helm defaults not even starting up - [CRITICAL] WORKER TIMEOUT #146

JustinGuese opened this issue Jun 7, 2023 · 12 comments

Comments

@JustinGuese
Copy link

error

redash main pod

redash-859c5f57c5-jlxjs [2023-06-07 08:48:40 +0000] [40] [INFO] Booting worker with pid: 40
redash-859c5f57c5-jlxjs [2023-06-07 08:48:49 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:13)
redash-859c5f57c5-jlxjs [2023-06-07 08:48:49 +0000] [13] [INFO] Worker exiting (pid: 13)
redash-859c5f57c5-jlxjs [2023-06-07 08:48:50 +0000] [45] [INFO] Booting worker with pid: 45
redash-859c5f57c5-jlxjs [2023-06-07 08:48:59 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:30)
redash-859c5f57c5-jlxjs [2023-06-07 08:48:59 +0000] [30] [INFO] Worker exiting (pid: 30)
redash-859c5f57c5-jlxjs [2023-06-07 08:48:59 +0000] [50] [INFO] Booting worker with pid: 50                             redash-859c5f57c5-jlxjs [2023-06-07 08:49:04 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:35)                              redash-859c5f57c5-jlxjs [2023-06-07 08:49:04 +0000] [35] [INFO] Worker exiting (pid: 35)                                redash-859c5f57c5-jlxjs [2023-06-07 08:49:04 +0000] [55] [INFO] Booting worker with pid: 55                             redash-859c5f57c5-jlxjs [2023-06-07 08:49:14 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:40)                              redash-859c5f57c5-jlxjs [2023-06-07 08:49:14 +0000] [40] [INFO] Worker exiting (pid: 40)                                redash-859c5f57c5-jlxjs [2023-06-07 08:49:15 +0000] [60] [INFO] Booting worker with pid: 60                             redash-859c5f57c5-jlxjs [2023-06-07 08:49:24 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:45)                              redash-859c5f57c5-jlxjs [2023-06-07 08:49:24 +0000] [45] [INFO] Worker exiting (pid: 45)                                redash-859c5f57c5-jlxjs [2023-06-07 08:49:24 +0000] [65] [INFO] Booting worker with pid: 65                             redash-859c5f57c5-jlxjs [2023-06-07 08:49:33 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:50)                              redash-859c5f57c5-jlxjs [2023-06-07 08:49:33 +0000] [50] [INFO] Worker exiting (pid: 50)                                redash-859c5f57c5-jlxjs [2023-06-07 08:49:34 +0000] [70] [INFO] Booting worker with pid: 70                             redash-859c5f57c5-jlxjs [2023-06-07 08:49:39 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:55)                              redash-859c5f57c5-jlxjs [2023-06-07 08:49:39 +0000] [55] [INFO] Worker exiting (pid: 55)                                redash-859c5f57c5-jlxjs [2023-06-07 08:49:39 +0000] [75] [INFO] Booting worker with pid: 75                             redash-859c5f57c5-jlxjs [2023-06-07 08:49:48 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:60)                              redash-859c5f57c5-jlxjs [2023-06-07 08:49:48 +0000] [60] [INFO] Worker exiting (pid: 60)                                redash-859c5f57c5-jlxjs [2023-06-07 08:49:49 +0000] [80] [INFO] Booting worker with pid: 80                             redash-859c5f57c5-jlxjs [2023-06-07 08:49:58 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:65)                              redash-859c5f57c5-jlxjs [2023-06-07 08:49:58 +0000] [65] [INFO] Worker exiting (pid: 65)                                redash-859c5f57c5-jlxjs [2023-06-07 08:49:58 +0000] [85] [INFO] Booting worker with pid: 85

genericworker

redash-genericworker-79c9547cff-4n76m     if self.connection.exists(self.key) and \                                     redash-genericworker-79c9547cff-4n76m   File "/usr/local/lib/python3.7/site-packages/redis/client.py", line 1581, in exiredash-genericworker-79c9547cff-4n76m     return self.execute_command('EXISTS', *names)                                 redash-genericworker-79c9547cff-4n76m   File "/usr/local/lib/python3.7/site-packages/redis/client.py", line 898, in execredash-genericworker-79c9547cff-4n76m     conn = self.connection or pool.get_connection(command_name, **options)        redash-genericworker-79c9547cff-4n76m   File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 1182, inredash-genericworker-79c9547cff-4n76m     connection.connect()                                                          redash-genericworker-79c9547cff-4n76m   File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 554, in redash-genericworker-79c9547cff-4n76m     raise ConnectionError(self._error_message(e))                                 redash-genericworker-79c9547cff-4n76m redis.exceptions.ConnectionError: Error 110 connecting to redash-redis-master:6379redash-genericworker-79c9547cff-4n76m 2023-06-07 08:50:07,947 INFO exited: worker-0 (exit status 1; not expected)       redash-genericworker-79c9547cff-4n76m 2023-06-07 08:50:08,953 INFO spawned: 'worker-0' with pid 24 

I guess redis does not deploy?

recreating

just helm install it like in the basic example, all default values

-> nothing shows up when port-forwarding

@Samuel29
Copy link

Samuel29 commented Jun 7, 2023

+1 for me, more context: I override version of postgresql (14)

my values.yaml:

  postgresql:
    image:
      tag: "14"
    persistence:
      # custom settings for PVC

  image:
    # SL 6/6/23: latest version of Redash docker image
    tag: 10.1.0.b50633

redash pod logs:

Using Database: postgresql://redash:******@redash-dev-postgresql:5432/redash
Using Redis: redis://:******@redash-dev-redis-master:6379/0
[2023-06-07 14:55:46 +0000] [7] [INFO] Starting gunicorn 20.0.4
[2023-06-07 14:55:46 +0000] [7] [INFO] Listening at: http://0.0.0.0:5000 (7)
[2023-06-07 14:55:46 +0000] [7] [INFO] Using worker: sync
[2023-06-07 14:55:46 +0000] [10] [INFO] Booting worker with pid: 10
[2023-06-07 14:55:46 +0000] [11] [INFO] Booting worker with pid: 11
[2023-06-07 14:55:46 +0000] [12] [INFO] Booting worker with pid: 12
[2023-06-07 14:55:46 +0000] [13] [INFO] Booting worker with pid: 13
[2023-06-07 14:56:17 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:10)
[2023-06-07 14:56:17 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:11)
[2023-06-07 14:56:17 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:12)
[2023-06-07 14:56:17 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:13)
[2023-06-07 14:56:17 +0000] [11] [INFO] Worker exiting (pid: 11)
[2023-06-07 14:56:17 +0000] [12] [INFO] Worker exiting (pid: 12)
[2023-06-07 14:56:17 +0000] [10] [INFO] Worker exiting (pid: 10)
[2023-06-07 14:56:17 +0000] [13] [INFO] Worker exiting (pid: 13)
[2023-06-07 14:56:18 +0000] [30] [INFO] Booting worker with pid: 30
[2023-06-07 14:56:18 +0000] [31] [INFO] Booting worker with pid: 31
[2023-06-07 14:56:18 +0000] [32] [INFO] Booting worker with pid: 32
[2023-06-07 14:56:18 +0000] [33] [INFO] Booting worker with pid: 33
[2023-06-07 14:56:48 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:30)
[2023-06-07 14:56:48 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:31)
[2023-06-07 14:56:48 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:32)
[2023-06-07 14:56:48 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:33)
[2023-06-07 14:56:48 +0000] [33] [INFO] Worker exiting (pid: 33)
[2023-06-07 14:56:49 +0000] [32] [INFO] Worker exiting (pid: 32)
[2023-06-07 14:56:49 +0000] [30] [INFO] Worker exiting (pid: 30)
[2023-06-07 14:56:49 +0000] [31] [INFO] Worker exiting (pid: 31)
[2023-06-07 14:56:50 +0000] [50] [INFO] Booting worker with pid: 50
[2023-06-07 14:56:50 +0000] [51] [INFO] Booting worker with pid: 51
[2023-06-07 14:56:50 +0000] [52] [INFO] Booting worker with pid: 52
[2023-06-07 14:56:50 +0000] [53] [INFO] Booting worker with pid: 53
[2023-06-07 14:57:20 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:50)
[2023-06-07 14:57:20 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:51)
[2023-06-07 14:57:20 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:52)
[2023-06-07 14:57:20 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:53)
[2023-06-07 14:57:20 +0000] [51] [INFO] Worker exiting (pid: 51)
[2023-06-07 14:57:20 +0000] [50] [INFO] Worker exiting (pid: 50)
[2023-06-07 14:57:20 +0000] [53] [INFO] Worker exiting (pid: 53)
[2023-06-07 14:57:21 +0000] [52] [INFO] Worker exiting (pid: 52)
[2023-06-07 14:57:22 +0000] [70] [INFO] Booting worker with pid: 70
[2023-06-07 14:57:22 +0000] [71] [INFO] Booting worker with pid: 71
[2023-06-07 14:57:22 +0000] [72] [INFO] Booting worker with pid: 72
[2023-06-07 14:57:22 +0000] [73] [INFO] Booting worker with pid: 73

@Samuel29
Copy link

Samuel29 commented Jun 7, 2023

update: here are attached the pod's logs with LOG_LEVEL=DEBUG
the bad news is: I can't find any smoking gun :-(

redash-server-debug.log

@Samuel29
Copy link

Samuel29 commented Jun 7, 2023

update 2: reproduced with the default version of postgres, as well as with v. 14 or 15
also reproduced on my M1 macbook within docker desktop.
interestingly the redash_server container is consuming a lot of CPU but there's no relevant debug info in the logs.

image

@Samuel29
Copy link

Samuel29 commented Jun 7, 2023

resource limits were the culprit!
once I removed them, it worked a lot better !
@JustinGuese
these were the resource limits that I used. I'm still digging around to figure out what's the best fit (I can't let redash use all my cluster resources)

    resources: {}
      # limits:
      #   cpu: 500m
      #   memory: 3Gi
      # requests:
      #   cpu: 100m
      #   memory: 500Mi

@JustinGuese
Copy link
Author

JustinGuese commented Jun 7, 2023 via email

@JustinGuese
Copy link
Author

JustinGuese commented Jun 12, 2023

nope, still nothing. the workers throw the following error

Using Database: postgresql://redash:******@redash-postgresql:5432/redash
Using Redis: redis://:******@redash-redis-master:6379/0
Starting RQ worker...
2023-06-12 14:16:43,020 INFO RPC interface 'supervisor' initialized
2023-06-12 14:16:43,020 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2023-06-12 14:16:43,021 INFO supervisord started with pid 6
2023-06-12 14:16:44,025 INFO spawned: 'worker_healthcheck' with pid 9
2023-06-12 14:16:44,029 INFO spawned: 'worker-0' with pid 10
2023-06-12 14:16:45,032 INFO success: worker_healthcheck entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
READY
2023/06/12 14:16:57 [worker_healthcheck] Starting the health check for worker process Checks config: [(<class 'redash.cli.rq.WorkerHealthcheck'>, {})]
2023/06/12 14:16:57 [worker_healthcheck] Installing signal handlers.
2023/06/12 14:17:01 [worker_healthcheck] Received TICK_60 event from supervisor
RESULT 2
OKREADY
2023/06/12 14:17:01 [worker_healthcheck] No processes in state RUNNING found for process worker
2023/06/12 14:18:01 [worker_healthcheck] Received TICK_60 event from supervisor
RESULT 2
OKREADY
2023/06/12 14:18:01 [worker_healthcheck] No processes in state RUNNING found for process worker
2023/06/12 14:19:01 [worker_healthcheck] Received TICK_60 event from supervisor
2023/06/12 14:19:01 [worker_healthcheck] No processes in state RUNNING found for process worker
RESULT 2
OKREADY
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 550, in connect
    sock = self._connect()
  File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 606, in _connect
    raise err
  File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 594, in _connect
    sock.connect(socket_address)
TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./manage.py", line 9, in <module>
    manager()
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/flask/cli.py", line 586, in main
    return super(FlaskGroup, self).main(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/flask/cli.py", line 426, in decorator
    return __ctx.invoke(f, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/app/redash/cli/rq.py", line 49, in worker
    w.work()
  File "/usr/local/lib/python3.7/site-packages/rq/worker.py", line 511, in work
    self.register_birth()
  File "/usr/local/lib/python3.7/site-packages/rq/worker.py", line 273, in register_birth
    if self.connection.exists(self.key) and \
  File "/usr/local/lib/python3.7/site-packages/redis/client.py", line 1581, in exists
    return self.execute_command('EXISTS', *names)
  File "/usr/local/lib/python3.7/site-packages/redis/client.py", line 898, in execute_command
    conn = self.connection or pool.get_connection(command_name, **options)
  File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 1182, in get_connection
    connection.connect()
  File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 554, in connect
    raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 110 connecting to redash-redis-master:6379. Connection timed out.
2023-06-12 14:19:08,615 INFO exited: worker-0 (exit status 1; not expected)
2023-06-12 14:19:09,617 INFO spawned: 'worker-0' with pid 23

@JustinGuese
Copy link
Author

so i would say redis doesnt work... i also can't see any redis pod, so i guess the redis pod isn't created?

@JustinGuese
Copy link
Author

also this repo is 3 years old, so i guess they do not offer support anymore & therefore i won't use it anyways, thanks for your help though!

@grugnog
Copy link
Collaborator

grugnog commented Jun 12, 2023

The chart is working with the default values, so I am guessing this must be something with your local setup?

@JustinGuese
Copy link
Author

hm, at least not only for me, but also @Samuel29
I'm using K3s, maybe there is a problem with that

@Samuel29
Copy link

Oh you make a good point. I'm using a managed Kubernetes cluster (v1.25) + ArgoCD + Helm

@Samuel29
Copy link

For the record here is my setup
the cluster is made of 10+ nodes with 4 cores / 15GB RAM
my values (I'm using Redash as a helm dependency)

redash:
  postgresql:
    image:
      # use postgres v15 instead of 9.6 (!)
      tag: "15"
    persistence:
      # OVH managed K8s
      storageClass: csi-cinder-high-speed

  image:
    # SL 6/6/23: latest version of Redash docker image
    tag: 10.1.0.b50633
    # (... some ingress variables kept for me)
  server:
    # server.resources -- Server resource requests and limits [ref](http://kubernetes.io/docs/user-guide/compute-resources/)
    resources: 
      limits:
        cpu: 1000m
        memory: 4Gi
      requests:
        cpu: 100m
        memory: 500Mi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants