• Blog Content
  • About Burns and This Blog
  • To the Hackers and Script Kiddies
  • SE Skills Survey – Help!!

Eric Burns Online

My Virtual Take on Tech

  • Blog Content
  • About Burns and This Blog
  • To the Hackers and Script Kiddies
  • SE Skills Survey – Help!!

Using Messaging To Reduce MTTR (ChatOps To Fix Things Faster!)

April 13, 2018 Message/Chat/Collaboration No Comments
HipChat integration with New Relic
New Relic integration into HipChat – https://blog.newrelic.com/2015/08/18/chatops/

If you aren’t using Messaging for your teams that diagnose and repair system issues you seriously need to rethink your methodology.  Having worked as a Sales Engineer since late in 2010 I have gotten to see a lot of business models that depend on IT to deliver either a service or a product.  And over time that Application Delivery Chain has gotten more complex and includes more and more moving parts.  It is no longer as simple as a web server connected to an app server that is then connected to a database.  The complexity and expectations have grown exponentially.  And with that the need to communicate with context and share your firm’s collective knowledge is essential.

The largest benefit is that the team can now see a threaded conversation that is easier to digest and keep current than an e-mail thread.  It also has history that can be scanned for the Postmortem.  Another plus is that team members that join late don’t have to be verbally brought up to speed – something that can be distracting on a phone bridge.  The initial gains come quick, but for a high-performing team there are some key tips and practices that can make them even more effective.

First suggestion I’d make is to have a channel dedicated for Emergency Management.  Treat it as a very special place.  No socializing, not random chatter – it exists strictly for tackling system issues.  I’d also suggest a simple name or abbreviation that is clear and won’t be confused with anything else.  So “EM” (Emergency Management) works better than “CC” (Command Center) since the later could be confused with Carbon Copy.  “I’ll CC Jim about the performance problems we are seeing” is not as clear as “I’ll EM Jim . . . ”

For this channel have a schedule of who is in charge.  Services like PagerDuty are great for organizing this responsibility.  When that person comes on shift they should also update the channel description to include their name and the current status.  So it might start as “Jim Smith – All Quiet” but then change to “Jim Smith – investigating performance issues in EMEA.”  And the process of “passing the baton” needs to be clear and documented.  The end of the shift is not a free pass to head for the door.  Once the new person has come up to speed the two commanders agree when enough detail has been digested and it is okay to hand over command.  At that time change the channel subject and have a message on the channel as to who is taking over.

When you are dealing with a small team the single channel works fine.  But what about when you’ve got a giant organization that might have hundreds of people involved with troubleshooting an issue?  In that case the manager of each team is the only one that speaks on the main EM channel.  They also have a separate channel going for their team to discuss ideas and decide what is relevant to share on the main channel.  It is also okay for the manager to delegate someone to share an observation or theory on the main channel also.  In the end this will greatly reduce the noise, and allow each team to speak with one clear voice.

There also needs to be extreme discipline around usage of items like @everyone, @channel and @here.  The channel leader should be the only one to use those handles.  But the rest of the team should be comfortable getting on and DMing (Direct Messaging) the current commander.  “Hey @jimsmith – I’m seeing database latency issues for the EMEA region that is impacting EUE.  Should we start an EM?” (EUE = End User Experience.)  At that point Jim can review the details and decide when to change the channel description or if an @channel message is merited.  Things might start with an @here to until enough detail is known.  This is especially critical since teams are often distributed around the globe and that @everyone on a key channel will wake people up.

Tool integration is also key for leveraging Messaging for Emergency Management.   They make it easier to not have efforts duplicated and keep context out front.  Items like commands that link to specific tickets in ticketing systems, display relevant charts or can restart systems (for folks with authority) can be very helpful.  No more asking “how is the load on system23?”  Instead a user can use a command to query the system load from the channel.  Everyone sees the number, and it is tracked for the Postmortem.   Slack has more than 1,000 prebuilt app integrations.

One excellent item to use this EM channel for is writing up the status details as well as deciding what to publicly update and when.  The commander ultimately should be the one make the call, but the team can reach a consensus on the criticality of the event.  This is also true for what to disclose and when to send the “all clear” message.  A firm’s status page is the public view into their stability, how quickly they deal with issues and how well they communicate.  Transparency is important, but you also don’t want to post anything that your competition can use to spread FUD – Fear, Uncertainty and Doubt.

Another minor tip is to consider pinning key items to the channel when there is an event going on.  Anyone joining the channel is expected to review those pinned items before speaking up, unless a Manager has delegated them to get some relevant information to the team.  (Being assumed that the Manager is familiar with what is pinned.)

A last item to consider is if you want to leverage external teams – that is allow people on the channel that are not actual employees.  If you depend on third party systems and components it makes sense to have them there – provided that they’ve signed NDA agreements and are aware of the rules and processes in place.  It greatly reduces context switching and removes the risk of “the telephone game.”  Slack has a great webinar designed for admins that can give some insight to the features that are relevant for inviting external people.  Of course some of the features are specific to Slack only, but either way it is worth the time to watch.

Having been involved with a multitude of firms with a SaaS offering I’ve gotten to see first hand how proper communication with context reduces MTTR and improves job satisfaction.  If you’ve got other tips, please do let me know.

 

Job Candidate - Don't Make The Mistakes I Made

Chatbots and . . . Umm . . . errors.

Leave a Reply Cancel reply

Recent Posts
  • Always On Culture and Global Teams
  • Google Dorking Against the Competition
  • API Guides Are Not Textbooks, Don’t Expect Your Users To Use Them That Way
  • ECHOGEAR Open Rack
  • Getting RAID Inside a Dell
Categories
  • Analytics
  • Attitude
  • CDNs
  • Conversational AI
  • Creative Projects
  • Gear
  • Getting Hired
  • High Level Tech Intro
  • Hiring Process
  • Message/Chat/Collaboration
  • Monitoring
  • Random Notes
  • Raspberry Pi
  • Sales Engineers
  • SE Skills
  • Startups
  • Uncategorized
Recent Comments
  • Peter Cohan on The Best Conference Demo
  • E Berry on Do You Know About These Female Trail Blazers?
Meta
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org
Archives
  • March 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • September 2024
  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • March 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • September 2021
  • August 2021
  • July 2021
  • June 2021
  • May 2021
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • November 2020
  • October 2020
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • October 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
Proudly powered by WordPress | Theme: Doo by ThemeVS.