Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add status.ocaml.org for monitoring #31

Open
tmcgilchrist opened this issue Mar 9, 2023 · 3 comments
Open

Add status.ocaml.org for monitoring #31

tmcgilchrist opened this issue Mar 9, 2023 · 3 comments

Comments

@tmcgilchrist
Copy link
Collaborator

Migrating issue from the wiki to allow discussion.

What should be on a status.ocaml.org page?

At a minimum we should have operational status of:

What are the options for hosting? Independent of the current infrastructure.

@avsm
Copy link
Member

avsm commented Mar 11, 2023

This is a good list to trawl through: https://github.com/ivbeg/awesome-status-pages. We could host it separately of the Scaleway and Cambridge Computer Lab infrastructure on Mythic Beasts, if not using one of the hosted options.

@tmcgilchrist
Copy link
Collaborator Author

I'm keen the style of something like https://status.gitlab.com that has space for the various public facing pieces plus the sub-systems that make everything work.

We are starting with a bottom up approach of building monitoring pages for each of:

Then we can choose something independently hosted to feed those checks into. This is just an update to say we are working towards this, with work still to do. :-)

@avsm
Copy link
Member

avsm commented Jun 6, 2023

This all sounds good. Might you please coordinate with @mtelvers on his observer.ocamllabs.io prototype mentioned in #42 (comment)? That looks like a good start, but I suspect its database will grow quite quickly as it's storing the results of ping rebuilds in each ocurrent node.

Also as @hannesm mentions in #48, we need a check for the freshness of opam.ocaml.org. I suspect that would be better done as a email/Matrix message from a build failure in the deployer pipeline rather than a healthcheck though, since otherwise it'll be difficult to distinguish between "no pushes to opam-repo recently" and "not a fresh archive on opam.ocaml.org".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants