Season 3, Episode 4 - "Is a slow cloud adoption better?" with Matt McComas, VP of DevOps, GM Financial
Andrew: Hey, it's Andrew. Welcome back to Season 3 of Network Disrupted, where I, along with some very smart guest, help fellow technology leaders, trade notes on navigating disruption in our space. This season, I've set a goal of exploring the issue of enterprise cloud adoption from as many angles as I can. Today, I'm joined by Matt McComas, who is the VP of DevOps, DevSecOps, CICD, Infrastructure as Code and Kubernetes for GM Financial. He started there on the infrastructure op side years ago, when DevOps as a concept with nascent. He saw the dysfunction of the traditional way their teams were structured. All the handoffs and the waste and inefficiencies, and started testing some hypotheses around how to streamline that. Long story short, he's still there today with some really sound lessons to share. In this episode, we talk about the evolution of automation, how to measure success of that? We talk about his company's cloud adoption journey. Why it's intentionally slow? We talk about the challenge of integrating greenfield and brownfield environments. What I appreciate the most about Matt is, he's refreshingly honest admission of what's still a challenge for him because he's definitely not alone. So let's get into it. If you have a moment, please don't forget to leave a review on Spotify, Apple Podcasts, wherever you listen to these. The feedback is always so helpful and you'll be helping more people like you discover the show.
Speaker 7: Maybe you can give me a sense of the complexity.
Speaker 3: We love the pilot proofing concept approach.
Speaker 4: Influences everything. It influences the human experience.
Speaker 5: There was several failures along the way.
Matt McComas: We want to be early adopter customers.
Speaker 7: You are handling sensitive information.
Speaker 8: Matt's work his love to.
Andrew: So welcome, Matt, and thank you so much for joining me. Why don't you maybe start by giving us an idea of what you're responsible for at GM Financial and, what your team actually works on?
Matt McComas: Yeah, thanks Andrew for the opportunity. Happy to be here. So team that I have at GM Financial is responsible for DevOps engineering. That's primarily composed of pipeline automation and infrastructures as code. We do a lot of automation, sort of general purpose, process automation as well. So the team was kind of born as an automation team from the beginning and really is branched out into these other areas as the years have gone on.
Andrew: You've been at GM Financial for quite some time. So what do you think sort of prepared you for this role or, why were you excited about this opportunity?
Matt McComas: So we were trying to solve problems that we actually saw real world issues every single day. We were in the trenches, so to speak, in terms of managing and supporting a legacy platforms. These were environments that were critical to the business, but also really difficult to manage. Very fragile and everything was very manual. So the team that we have now was really born out of a need to solve some of those problems. So that's literally why we're here. Honestly what gets me up in the morning every day, just solving those kinds of problems.
Andrew: Right. When you think about DevOps and a lot of the tool chains available today and a lot of the capabilities available today. You start with the tutorial or learning something and always starts greenfield, which is way easier to deal with than existing stuff. Your enterprise has been around for quite some time and obviously there's a tons of stuff. There must have been some interesting strategies to try to drive DevOps around existing infrastructure. I'm assuming also, drive more modern infrastructures in cloud?
Matt McComas: Yeah, it certainly was. One of the interesting things about what you just said is that it reminds me of one of our early lessons learned, which is that a lot of the awesomeness that you get out of DevOps and automation, really requires the infrastructure and architecture to support it. So it's interesting early in our journey, we were trying so hard to do so many things. It just seemed like we were always kind of swirling upstream and it was always a challenge. It's always a challenge in general, but it was excessively challenging. Really what we learned along the way is, you have to kind of think about the whole system itself, the architecture, the infrastructure. Everything needs to be supportive of the DevOps activity that we're trying to initiate. So yeah, more recently in the last year or two, finally the organization's kind of began to catch up in terms of understanding that. So I think that's kind of a key element of where we're at in our journey right now, which is a lot of platforms and service rearchitecture work to kind of change the core of the business.
Andrew: I'm assuming examples are things like, lack of APIs or lack of certain telemetry, or I know there are a couple of examples of existing infrastructure architecture and where, that sort of drives the difficulty of automation?
Matt McComas: Yeah. So it's just basically everything you said is just spot on. We came from a world that was not at all API driven. It was very legacy in approach, a lot of SOAP type calls and things like that in the infrastructure and nothing really had an API. So now we're moving into a world in which we're leveraging APIs all the time. We're leveraging APIM and Azure. So everything we have really is something you can interact with and if you're an API and write automation or code around. So it's such a different world from where we were. Yeah, you're spot on.
Andrew: Were there process issues too? I always remember this one case with one of my customers, where we implement an automation to do rapid changes to whatever. It doesn't really matter. Everything went through their QA processes. Everything looked like it was working right and went ready to launch it. Somebody came in and said," Can't do that because those sorts of changes require this process. That process included like, what's the backout if it doesn't work? Who's approving," and stuff like that. So they actually had to go back and re- engineer processes to allow for automation. Did you run into a process issues too?
Matt McComas: Oh, yeah. So there's lots and lots of that. Honestly, a great example of that is early in our journey, we went on this path of automating server builds and all of the dependency is around server builds. So network dependencies, firewall, DNS. All the components around setting a server hub we automated. We had a pretty good system working and still do. But one of the things that we noticed is that in spite of the fact that we were able to take basically a two to three week process of manual server build and carve it down to about, maybe an hour or two. We still had end- to- end about a three week server provisioning process. What we found is that all the processes out in the front that are prerequisites to even starting server build process, were the things that became the impediment. So the real lesson learned for us there was that to your point, it's not just about the automation. They say, DevOps is people process in technology. I find honestly that the technology side is the easiest part to do, which is the tool and automation implementation side. Process is they're more difficult, because it really requires a lot of humility because a lot of people are married to the process, especially people that have been around a while. They've always done it that way. Many people kind of get stuck in this rut, where they think that process can change. When in reality, there isn't really a process that can't be changed. A lot of times, if you just step back and look at it and start asking, why you're doing things the way you are, you find while it's because we were trying to compensate for a mistake or some problem that happened years ago that nobody really knows about anymore.
Andrew: Yeah, 100% I was about to say is that because, a lot of these processes become heavyweight because of some failures seven years ago where, somebody forgot to do X or, whatever the case might be in this fail. So therefore, there's this step in the process and people often don't remember what actually happened. But something I talk about a lot is, well, the requirements have changed. Like, you're driving this automation. So ultimately, and I'll make an assumption. You can make lots of small changes. When you're making lots of small changes, that can be automatically hopefully asserted that they were done correctly and backed out if necessary. The risk profile's weigh different than change window, tons of stuff gets thrown in everybody. So the requirements are also different. So that should create a process change, but yeah, people hold onto that five years ago major outage. We can't do that again without doing this first.
Matt McComas: Absolutely.
Andrew: Yeah, organizational memory. How do you work through that process issue?
Matt McComas: That's a great question. It's obviously very difficult to do that. To work through the process issues, you have to, like I said, have some humility and step back and take a look at it. One of the things that's true as well is that process is really a risk management activity. It's something that was implemented to mitigate a past problem, as you mentioned. In theory, as you get better at the automation side, the automation builds sort of safety into the process as you're trying to implement. What you should be able to do is demonstrate that you're able to achieve that safety and risk management through the automation and check all the boxes. That right there can become really sort of a fundamental page turner in terms of process. I'll give you a great example. We recently have implemented automated change management or some of the things that we do. We're a ServiceNow kind of shop as so many are. The change management process in ServiceNow is kind of follow a traditional ITIL process use and things like that. That team tends to be very sort of rule- oriented, very structured and prescriptive in their approach. So they were naturally skeptical. We approached them and said," Hey, listen, we can automate everything around this." Then demonstrate that the change was successful and that we met all the requirements. There's no need for us to go through this manual process. We can just automate it all. They were very skeptical, but we did a pilot. We demonstrated it to them and we won some converts. We built a little bit of confidence. We did a little bit more, we built some more confidence. Then we got to the point in which, even the people that didn't want to change originally now were at a point where they were like," Why can't we change? You guys have demonstrated this meets all the requirements." In fact, it's even better because now we have traceability. In the manual change world, what was happening is if the change failed, people could fudge the system and they could reimplement their change. Maybe it wasn't always very accurate. Maybe you had a successful change that really wasn't successful. Really looking at it, maybe it's successful because you retried it five times. But because the change process is manual, that was hidden and invisible all that work, all that extra effort. But whenever you have all this chain together and it's automated and it is traceable. Now, if you have a true change failure, you can't really hide it. So it's better from a change management standpoint, it's better when you're tracking things like MTTR and some of the core DevOps metrics, because now you have a change metric that's truly accurate. It's truly reflective of what happened because the people aren't involved. Instead, it's the automation that's doing the work. So that was a success story and a great example of how we changed process.
Andrew: Yeah. No, that's good. It always helps by proving some success along the way. But I really like what you said about demonstrating the success of the change. I think too many people think about automation as calling APIs versus successfully making changes. Those are two very different things.
Matt McComas: Exactly.
Andrew: I think a lot of the lessons learned in the past, even the five year ago problem like, what we made this change and somebody forgot to update the firewall or the load balancer. So now those teams need to be available for a change window and you need to coordinate change windows with those guys, those sorts of rules from the past. Also, you can demonstrate how you can automate around, because now you're doing automation. So if you know there's a fault where firewall rule isn't changed as well, this isn't going to work, then go either change the firewall rule or at least assert in the automation process that it's changed. So I think people think also historically about change around a single system and worry about the broader impact as opposed to automation. Not being about changing a DNS record, it's about deploying a provisioning a server and everything involved in that, which is good. I think obviously, I'm not surprised to hear your ServiceNow user and ServiceNow has done a really good job of penetrating enterprises. It's interesting from my perspective, to see different companies and how much they're driving ServiceNow to automate versus as a next generation, ITIL, ITSM system to port processes evolved. Move from Remedy to ServiceNow, versus as part of a broader automation strategy. One of the problems lots of enterprises have in driving automation strategies is, not having a source of truth to automated against. If you don't know the current state accurately, then often times it's difficult to automate. Did you run into any of those sorts of issues?
Matt McComas: Yeah, that's a continuing problem, I think. For example, we'll just take configuration management, for example, in Chef. We happen to be a Chef shop. Chef is an awesome platform and an awesome tool, and it does a good job. It's difficult to implement in brownfield server fleets, for example, where you have many hundreds and thousands of servers and you don't really have a lot. In the system, Chef system doesn't know much about those servers and you don't even have a really good CMDB, if you will, around those servers as well. So it's very unknown. It's really difficult in a brownfield enterprise to implement some types of automation because you almost have to redeploy everything to really get it right. It's easy if it's greenfield because you're starting in scratch. Again yeah, to your point, if you're going out there and trying to automate things that are already there, then you're not going to have a lot of information on them most likely. It's challenging for sure.
Andrew: Yeah, because Chef's going to try to assert some new desired state. But if the old state isn't known and why that's the old state? There's some manual work around on this server because of some static route was added, because of some issue that nobody remembers and easy to have that removed, if you don't know why it was there in the first place. I think that's an ongoing challenge. So that's the brownfield side. You guys are pushing hard, you mentioned Azure. So I'm assuming that you're pushing into Azure as well?
Matt McComas: Yes, sir. That's all greenfield. This organization, I think very wisely made a commitment to largely avoid a lift and shift approach to cloud adoption. So we have attached at our cloud efforts to really an application platform rearchitecture project. So in essence, instead of moving platforms, we are redeploying them and refactoring them into Azure and then deprecating the old. It's a lot more difficult and time consuming, but it's also a much more sustainable approach. So we're currently, I think on about a three to four year journey to implement all of this and really change the heart of the business all the core platforms that handle auto finance within GM Financial are being pushed in this direction. So very much part of what we're doing every single day.
Andrew: Yeah. Obviously, it's greenfield versus brownfield, but it's also, I don't know. Maybe I'm thinking about this the wrong way. But in some cases, DevOps became a thing started to exist. Customers companies started investing in it, because of this gap between development and operations. You go into a cloud environment and in many cases, especially if you've refactored something, the expectation is that the team responsible for developing the software is also automating its deployment and everything else. It's sort of part of the software, as opposed to necessarily having another team right with them. Well, maybe differently. Let me suggest this and I'm curious what your view is. The DevOps team would be more aligned with ops in the on- premises world. In the cloud world, they're way more aligned with development. Is that a naive statement of mine? That's just an observation I've had over time.
Matt McComas: Probably true, especially in our organization. I think it depends upon how the DevOps team kind of evolves over time. There's some good work out there in the field now on this. I don't know if you're familiar with Team Topologies, which is a book that's published. This essentially goes through all the different models of how DevOps teams are deployed. But in our case, I think what you're saying is pretty close to accurate. We were very ops- oriented before the cloud. Although I will say that we were trying to solve problems for software delivery pretty early, we were deploying pipeline, CSD pipeline platforms and trying to automate software delivery process. But certainly as we've gotten into the cloud, everything becomes a lot more software developer- centric because to the point you're making, I think we all know the cloud compute model really kind of requires that you think about everything in terms of software delivery. But now you're delivering your infrastructure and your applications in the same or similar pipelines and it's all done in code. So yeah, that really requires a shift. Fortunately for us, it wasn't a difficult one because by the time we got to cloud transformation project, we were already two or three years into building a DevOps team skill sets. So we were already kind of heading that direction anyways for us. But I could see how it would be challenging if you were not. You were just trying to make that leap without any prep work.
Andrew: Yeah, for sure because you get into knowledge gap areas too. Not anybody in the traditional network can log onto a switch and make changes or call an API call. But you open up cloud compute and certainly in a sandbox. Until governance is put in place and depending different companies do it different ways, then all of a sudden, somebody who's not necessarily deep knowledge in provisioning networks, whatever the case is. Provisioning networks without necessarily understanding the ramifications, which runs into issues sometimes. So I'm assuming your team is sort of multi- domain knowledgeable. What was your philosophy in building the team?
Matt McComas: It's a good question. So I think early on, it was difficult to find people who specialize specifically in DevOps itself. For a long time, there's been this resistance to have a DevOps team by name and even people titled DevOps engineers. So we focused early on in finding people who had a lot of good infrastructure ops and IT experience. But also knew how to write scripts, either in PowerShell or Python or any of the scripting languages that are common. We wanted people who knew how to interact and manage their day- to- day work through automation. So that was kind of the early types of traits that we're looking for in people. Then I think also learning agility. Certain people can learn faster than others. So we always were really looking for that as well. To me, if you can learn fast and you can pick up complex topics quickly, then there's really almost no limit for you. But that's challenging because I don't know. Honestly, I haven't figured out if that's something you're born with, there's something you can call the date and it's probably a whole debate in and of itself. But yeah, those two things. As we kind of got further along, we actually started hiring people who had a software development background. That's what we've done a lot more of in the past two to three years. So what's interesting is kind of evolved over time. But the net effect of that is, we have teams that are pretty well- rounded. They understand the op side and the development side. Of course, you're going to have people in those teams that are sneeze in certain areas and not in others. So everyone kind of compliments each other, I think pretty well. We got teams that have a good mix of kind of both skills.
Andrew: Yeah. I think that's important. Obviously bringing in software developers as well and software development processes, because you're developing software. That your software might have defects and your software needs to be tested. At the end of the day, it's simply that more software. How do you think about measuring the success of what you're doing or, any key metrics along the way? I think it's often difficult. I've seen it done different ways. I've read some books about different ways to potentially measure. But ultimately, you're trying to help the business release faster. I'm just curious how you look at the success of your team?
Matt McComas: That's a great question and it's a tough thing to answer. Metrics are difficult but it's so vital. I think that organizations really have to focus on data and get good at metrics because I think we really have to measure everything or as much as we can. But in terms of the success of my team, I kind of look at it in terms of, how many things have we automated? Specifically, how many things have we made self- service so that teams can self- serve?. So that's what we've done. That's kind of been the trend of ours for the last 12 to 18 months. So we were actually created a service early last year called, Pipeline as a Service. It's a way to self- revision your own pipeline. So if you're in a product or development team and you need to see SP pipeline, you go to this portal. You fill in some values and you hit submit and you get a pipeline. The great thing about that is, it includes all of the compliance and security requirements necessary in order to really take advantage of it. So it meets our requirements. Anybody can produce a pipeline, but it's producing it the way we need to have it produced to meet GM Financial's requirements is what we truly automated. Now look at the success of those kinds of services or infrastructure self- provisioning kinds of services. How successful are we at producing them and how much are they being used? Those are kind of the measures I look at, adoption measures. Unfortunately though, there's also a development and product team upskilling side to this. So what we've learned in the last year or two really is, you can create a great service. It can be truly automated and truly self- service and truly innovative, but it's not worth much if nobody uses it. So you have to not only create such a service, but you have to help people learn how to use it. So there's very much an adoption side to this as well. So I think it's all about empowering the development teams and producing KPIs and metrics around telling that story. That's what we're trying to do right now in our organization, and it's difficult. It really is. Part of that difficulty is something we're all facing right now in the IT industry, which is skill gap. I think that technology is moving faster than people can keep up with in terms of skill. You're seeing it widespread. It's just endemic. Everyone's got skills. Skill up the rough skill and challenges in their teams and you can't hire people fast enough and you can't pay them enough. It seems like these days to get people to really understand how the tech works. So all that together is quite daunting, but I think the metrics piece critical and we're working on it honestly right now. It's a constant challenge for us. How do we measure? What's the best way to measure? How do we tell that story? The metrics should actually tell the story. That's really what it's about. You're trying to tell stories. You're trying to craft the narrative, that explains where you're at and where you want to go.
Andrew: Yeah. Metrics are always dangerous because you decide on the wrong metric and you drive people to potentially do the wrong thing in order to meet that specific metric or that specific SLA. It's tough to do oftentimes an automation because you also want to tie to business value. Oftentimes, that could be around cost, that could be around man hour save, that could be around whatever the case might be but certainly could be tough. The skillset issue for sure is something I see on and on. I think you nailed it, which is technology's changing super rapidly. You can decide the best way to deploy A, B or C, on a cloud or use these services or those services or use network peering. Then the cloud vendors come up with transit gateways. So we have customers on like, architecture four, five or six of their cloud deployment. They might iterate quickly because the clouds can provide more capabilities at a rate that is beyond the ability of an enterprise that doesn't have a fleet of people that are well- embedded to absorb those changes and figure out what new best practices are? It's a tough problem broadly and this technology is new. So it's not like, you're going to go hire a 10- year veteran of who's wise? They've been through enough failures and have tried things in different ways. My concern always is in the face of that challenging environment, new technology, not necessarily an expert around and then sort of the people might act. To solve the problem, the first thing they do that works well, that's the answer. It might not be the right answer. It might not be scalable. It's going to bite you later. It's a trial and error software development, which any of us veterans know, leads to issues downstream because you're not considering much. You're just trying to make something work.
Matt McComas: Absolutely. Yep.
Andrew: Interesting. Where are you in the organization? Do you work into the operations team? Do you work into the line of business? Into core IT or what part of IT and where do you fit?
Matt McComas: Like I said early on, we grew up in the ops side early last year. They pulled us out of the ops suite. So now we literally sit between operations and the development organization. So it's kind of an interesting place to be. It's truly in many ways DevOps, because it kind of sits there in the middle. We're certainly in IT obviously as well. So the interesting thing about our teams and how we're continuing to evolve is, we're commonly shared services organization that sits underneath product delivery and the teams that are sitting in the development org. We're sort of foundational teams. I like to say that we build the inaudible that help teams move fast. Yeah, we're very much kind of our customers are software development. Whereas if you're sitting in a product team, your customer really it is the business, it should be. Yeah, that's kind of where we sit right now.
Andrew: Yeah. That makes sense, as a shared service. Is there governance around that or your shared service that sort of sells your services? If the business wants to go build an application in Azure and they do everything themselves, well then or is is there more of a governed organizational approach?
Matt McComas: I think it's a lot more. It's a lot more covered. There's certainly opportunities to improve and mature that governance. But yeah, it's certainly not. We're certainly not in a place yet in which the business can go kind of deploy something into Azure at will. Now certainly the business is critical and kind of pivotal in helping us determine certain directions. In terms of the services, they either want to develop or mature, but they don't really play too much of a role in the technology side of it. That's mostly handled on the IT side. It is handled in enterprise architecture but the whole organization is evolving. Really cloud has changed. The introduction of cloud has really changed so many things and pushed on so many teams to kind of change their approach. In terms of our own teams, we're still kind of in the process of evolving an engagement model. That kind of helps teams to understand how to interact with those. The teams we have and the organization we have is only about 18 months old or so. Last year in 2020, I think it was just surviving and trying to figure out how to operate in the new reality we all faced? Literally, the interesting thing is we had an org change and we kind of repositioned ourselves within the organization on March 1st, 2020. Well, I think the big date everyone circles on the calendar, most of us is March 13th, 2020 as the last day in the office. So we were going through not only the changes in the world around us, but huge change in the organization itself. So I think for us, all of last year was just, what are we and what are we trying to become? How do we succeed and how's the company going to be successful given this new reality? All those unanswered questions. So luckily, all of that stuff kind of got answered over time. But now we're just now getting to the point in which we can think about the governance you referenced, which is, how do people interact with? How do we interact with them? How do they get help? How do we help them? That sort of thing, and standardizing process and making it easier and less confusing, if you will.
Andrew: Yeah, but I think that's right. Regardless of the fact that the pandemic should never have happened. But even without that, if you go governance first, then you don't know if it's right. I think you need to start leaner and then build it up where it's necessary. The reason I ask that question is we're seeing more and more of that starting to build up now. That might be in, who's ultimately responsible for what, who? By the way, the cloud vendors help support this as well. So you can have a bit of a separation of concern of who's allowed to do what? I don't just mean permissions for different types of things in cloud, but provisioning networks in your account versus this account. They're adding more and more controls to allow for that governance. But yeah, I think you're approaching correct way. So it sounds like, the organization is well aware both through your efforts and the successes you're having. But just in general, on the requirements for looking at things differently and changing processes. At least I haven't heard that you sort of need to fight for organizational alignment and strategy around driving this way. Its you exist because of those strategies?
Matt McComas: Yeah, that's correct. In fact with the reorganization that kind of created a team in the organization I'm part of, was a result of a formal IT transformation effort that was initiated in 2019 it really kind of came to. I don't know if it came to the climax. It kind of began to crest if you will, early last year. One thing I've learned is transformation is a continuous improvement process. You're never really finished. I think our organization is learning that too, because they had a transformation project. They wrapped it up and, and then later they're like, wait a minute? We still have a lot of work to do. I think now they're finally figuring out. There's really not an end to transformation. It's always some place on this journey. Right?
Andrew: Yeah. Well said, and I think that's a good thought to wrap up on. Yeah, this isn't we just need to change this to this, and then we're status quo again. That's the power of it, because now you can continually improve what you do, both as new capabilities come out as the business will requirements change. I think I've said this a thousand times, but the ultimate goal of a business is they want you to meet their unpredictable demands at some predictable cost, with reliability and security and the same quality that they've expected before. Their requirements are just going to change way faster. You need to deal with it and that's just going to continue to happen. It sounds like, you're really driving some excellence there. I like also the fact that it sounds like, you're doing a bunch of reading too. I'm always like, I read and I read and I read. Books, just like technology gets dated quickly, sometimes books can get dated quite quickly. But oftentimes, the advice is the same advice as it was 20 years ago. As organizations like yours become more software driven, then there's best practices throughout the years that just simply apply. It sounds like, you're sort of are expanding your knowledge base going forward too. So that's great.
Matt McComas: Yes, absolutely. We're always learning new things and we're always reading books constantly. So it's been a real journey for us. Not just the technology journey, but a real kind of transformation journey for the leaders as well. That aren't necessarily hands on keyboard on the tech side. So it's been fun.
Andrew: Yeah, for sure. Matt, it was an absolute pleasure talking to you. I wish you the best and continue success and thank you for joining.
Matt McComas: Awesome. It's great to be here.
In this episode of Network Disrupted, Matt McComas is here to discuss the merits of a slow cloud adoption. Matt is the VP of DevOps, DevSecOps, CICD, Infrastructure as Code, and Kubernetes for GM Financial, where he helped streamline the way their teams are structured.
Today, Matt dives into the evolution of automation with insights on measuring its success. He talks about GM Financial's cloud adoption journey, the reason it is intentionally slow, and how to work through process issues as they come up. Listen now!
Let me know what you thought of today’s discussion! You can tweet me at @netwkdisrupted + @awertkin, leave a review on Spotify or Apple Podcasts, or email me at firstname.lastname@example.org.