Return to site

Why we built OpsStack

From our Ops Team to Yours

The OpsStack Story

OpsStack is a great unified operations platform, destined to become the ERP of Operations. But why did we build it ?

Our other life is as a large-scale Managed Service Provider (MSP), having started in physical servers, then private clouds, and now public clouds as a key MSP partner of AWS, Alicloud, and others. So we have long and deep experience designing, building, and managing hundreds of large on-line systems, many of which scale to tens and hundreds of millions of users.

We are a rare full-stack MSP, managing every part of every system, from the clouds and hardware to the OS to the core services such as Java, PHP, Python, Java, MySQL, MongoDB, Redis, Nginx, Apache, Hadoop, Docker, etc. plus web monitoring, availability, and more.

We also handle performance, security, reliability, cost savings, DBA, logging, capacity planning, load testing, upgrades, patches, and everything else customers need. We do this while customers continually change things, often every day, often without telling us, so the idea of very dynamic systems is nothing new to our world.

Finally, we are one of the few MSPs in the world taking over large existing systems, often without any documentation nor IT help. We then optimize, tune, and rebuild them on-the-fly while they're running. This is pretty challenging, as you can imagine.

Taking over hundred-million-user systems is a little complex

To do all this, we’ve built dozens of tools and systems, including lots of monitoring, diagnosis, troubleshooting, change detection, tracking, and other things to let us scale or business and do this 7x24 for hundreds of different systems simultaneously. We use some open source tools, but few are really designed for the scale, chaos, and diversity of what we deal with daily.

For example, the world talks about DevOps and carefully-controlled infrastructure changes, but our reality is things change randomly all the time, by developers, partners, ops teams and more. So our source of truth is usually the running server or service, though this gets messy when various servers such as web1 and web2 don’t agree with each other. Things drift and change all the time. And most hundred-million-user system are quite complex and fail in interesting and unpredictable ways.

So we’ve developed lots of tools to manage this, and these tools have evolved into OpsStack, which has unified all the tools, added Docker, Lambda, and other support, and been extended to cover the rest of the stack and the full life-cycle including reverse engineering and change management.

Now, as one of the world’s top MSPs and ops teams, we are bringing this system with the rest of the world, so everyone can benefit from our long and deep experience, dynamic system management, and innovative approaches to running on-line systems.

This is OpsStack, from our Ops Team to Yours.

All Posts
×

Almost done…

We just sent you an email. Please click the link in the email to confirm your subscription!

OKSubscriptions powered by Strikingly