Good Retry, Bad Retry: An Incident Story

Beej Jorgensen@lemmy.sdf.org · 2 months ago

Good Retry, Bad Retry: An Incident Story

RubberElectrons@lemmy.world · 2 months ago

All of what you’re saying seems correct. I think this is more of a meta discussion, on how (in this case) retries, even with exponential back off, aren’t a solution by themselves when you look at the system overall. There are interesting hidden caveats to any common solutions, this is one I personally wasn’t aware of.

Practically, adding a timeout budget so that the clients themselves just error out (forcing a manual refresh) sorta accomplishes the same as what you’re positing.