Today in Edworking News we want to talk about SeattleDataGuy’s Newsletter Why Your Data Stack Won't Last - And How To Build Data Infrastructure That Will
As a consultant, I have been called in to review and, in many cases, replace dozens of half-finished, abandoned, and sometimes forgotten data infrastructure projects. The data infrastructure in a few cases may just need a little tweaking to operate effectively, but other times the project is either so incomplete or so lacking in a central design that the best thing to do is replace the old system. Trust me, I’d love it if I could come into a project and simply change a few lines of code, and then everything would just work. However, so many projects are filled with unclear design decisions or resume-driven development that were never rooted in good planning. Of course, business stakeholders may have also pushed to get things done quickly, forcing data teams to take on tech debt that will never be fixed. Don’t get me wrong, you want to get things done and move projects forward. But taking on technical debt is a decision that needs to be made intentionally. Otherwise, like in resume-driven development, your data infrastructure might disappear. This begs the question: How do you ensure the data infrastructure you’re building doesn’t get replaced as soon as you leave in the future? In this article, I wanted to dive into the problems I often come into that require me to replace the current data infrastructure and how you can avoid it. So let’s dive in.
Resume And Vendor Driven Development
When you design a new system, it shouldn’t be driven by a vendor or your resume. Don’t get me wrong, if you have a solid, reliable base for your data infrastructure, testing out the occasional POC with new tools isn’t a bad idea. The problem I find is that many data teams get thrown into having to design data infrastructure with little experience actually setting up one in the past. Don’t get me wrong, we all start with no experience. But it can be difficult to assess what all the nuances of different tooling and designs can be. In turn, many of them will look up articles to see what some of the big tech companies are doing or what a vendor might recommend. This can lead to design choices that likely won’t fit your use case. One great point that was made by an ex-manager in analytics I was just talking to was that it can be easy to pick an ETL or data pipeline tool that sounds good on paper. The account executive and sales engineer will put together the perfect pitch and POC. But eight months into a project, there is a key blocker that the tool either didn’t disclose or could only be found out with experience. Honestly, the vendor themselves might not have even known the limitation. A similar point can be made when you’re trying to design your overall architecture. If you try to follow what a large tech company is doing, it might be a bad fit for your company. You need to be very intentional with your core design.
How To Avoid Resume-Driven Development:
Talk to people who have used the vendor - Most data engineers are happy to share their thoughts on various tooling. Thus, try to find a mix of different data engineers and architects that you can ask about their experience with the tooling. Some will tell you flat out, don’t use said tool while others will sing the praises of some tools.
Read agnostic sources - This can be research papers, non-affiliated consultants, books, etc. Just read everything with a grain of salt!
Start with the business goals - It can be tempting to find the right job for the tool. After all, we all want to put the current “it” tool on our resume. However, when you do start looking to develop data infrastructure, your first goal should be to work with the business to understand what they actually want their outcomes to be. Then you will want to start to take a measured approach to finding the right tools or designing the right system.
Key Person Dependency Issues
One of the constant issues many companies face, especially the smaller ones, is a key person dependency issue. This manifests as either a single key developer or perhaps a small team that all decide to develop the data infrastructure that will be difficult for future teams to maintain or even understand what is going on. This could be because the prior team didn’t document their design, or only documented it for themselves. Sure, they have built data infrastructure that works and solved the business's current problems. They maintain it, and no one asks questions. Perhaps no one realizes that one of the team members has to wake up at 6 AM every morning to check and ensure all the reports and tables have been created. That all works until the day they leave. Then the data infrastructure may hold for a little bit, but eventually, problems arise. Perhaps the company has hired a replacement data team but they may eventually hit the limit of what they can actually support. This was a major issue for many of my past clients. Around ⅓ over the past few years reached out for my services due to this. In one example I came in and found a data vault project that wasn’t being used. It actually had pretty good documentation. However, the team had taken so long and hadn’t fully completed the project which led to their dismissal. What’s worse is because the project was never completed several analysts had to work weekends to create key reporting for the board. For this specific case, we massively simplified what they had built and built a simpler data warehouse that a smaller team could maintain. And of course made sure the reporting was automated so the analysts could stop working weekends.
How To Avoid Key Dependency Issues:
Documentation - Make sure the documentation you create isn’t just written for you, but for someone who has never seen your code before. In particular, highlight key decision points starting with why you may have picked certain tooling, why you may have implemented logic in certain places, etc.
Cross-training (where it makes sense) - For larger teams, consider cross-training. This can also include semi-technical employees from other teams. I am not saying they will or should replace the engineers. But from my perspective, as a consultant, it's been great to walk in and talk to marketing analysts or operational analysts who have some bearing on why the system was developed the way it was.
Building Data Infrastructure Without The Business In Mind
In several of the projects I have come into, the communication between the stakeholders and the data teams had either broken down or never existed. The data team had gone off for several months without ever really delivering anything that could be reviewed by the business. Then by the time stakeholders saw the results it just wasn’t what they were looking for. Your stakeholders need to be brought into what you’re building. Let me give an example. One of my clients I came into work with had a data warehouse. It technically functioned. But within a few days, I realized no one was using the 2-3 dashboards that were being supported by the data warehouse. Why? Well, a combination of reasons but one was it was never actually built with the stakeholders. So they didn’t find the end results helpful. This section’s topic is likely what kicked off my inspiration for this article. There is so much data infrastructure that is built in a silo without being tied to the business. All of which eventually leads to failed projects and data infrastructure not lasting.
How To Build With the Business:
Start with business outcomes - I already said something similar above. But it’s always worth repeating. You don’t want to build infrastructure for infrastructure’s sake. So before starting a project, be clear what you’re trying to do in terms of improving the business.
Keep the business in the loop - It can be very tempting to not have meetings reviewing your dashboards or analyzing the metrics. After all, you have to get work done. The problem arises when at the end of your 3-6 months of development you finally show off your end-product and it wasn’t even close to what the stakeholders wanted.
Building Purposeful Infrastructure
Your data infrastructure doesn’t have to be a mess so to speak, or perhaps I should just say it doesn’t have to be difficult for a future developer to understand what is going on under the hood. My best experience in learning how to program was when I worked for a company and opened up their code base and I understood what was going on. The abstractions weren’t so deep that I couldn’t trace them, and there were clear standards implemented in naming conventions. Honestly, it was just a lot of little things done right. A well-thought-out and simple system. Naming that people can understand. And documentation that isn’t filled with acronyms and assumptions. That can help ensure that years into the future, your data infrastructure is still around.
Image: A well-structured data infrastructure can save time and resources.
With that, I want to say thanks for reading!
Remember these 3 key ideas for your startup:
Avoid Resume-Driven Development: Ensure your data infrastructure decisions are based on business needs, not just on what looks good on a resume or what vendors pitch. Always start with the business goals and validate tools through diverse sources and user experiences.
Mitigate Key Person Dependency: Document your processes thoroughly and consider cross-training within your team. This ensures that your infrastructure remains maintainable and understandable even if key personnel leave.
Build With the Business in Mind: Always align your data infrastructure projects with business outcomes. Keep stakeholders in the loop throughout the development process to ensure the end product meets their needs.
Edworking is the best and smartest decision for SMEs and startups to be more productive. Edworking is a FREE superapp of productivity that includes all you need for work powered by AI in the same superapp, connecting Task Management, Docs, Chat, Videocall, and File Management. Save money today by not paying for Slack, Trello, Dropbox, Zoom, and Notion.
For more details, see the original source.