Principles of Good Data Architecture

Feb 1

2 min read

2

53

0

As part of my annual 'learn-unlearn-relearn' practice, I’ve focused on developing a deeper, more comprehensive understanding of Data Engineering as a holistic discipline. When I transitioned into Data Engineering five years ago, I relied heavily on my technical skills to bridge the gap caused by my previously unrelated experience. After gaining a solid foundation in the field, I recognized the importance of going beyond just technical execution and dedicating time to truly understanding the core concepts of Data Engineering.

Recently, I’ve been reading Fundamentals of Data Engineering by Joe Reis and Matt Housley. In this post, I’d like to share some key insights from the book on what makes a strong data architecture.

Choose common components wisely

Common components are shared assets used by multiple teams across the organization (e.g., Git tools, object storage, orchestration frameworks, etc).
Identify tools that benefit all teams.
Avoid a one-size-fits-all approach

Architecture is leadership

As data engineers, we should embrace architectural leadership and actively seek mentorship from experienced data architects, when applicable.
Ability to mentor the data engineering team, train them in best practices, make informed technology choices, and align resources to pursue shared goals in both technology and business.

Always be architecting

Data architecture is not a one-time solution; it’s an agile system that must continuously adapt to evolving business needs and technological advancements.
Develop a comprehensive baseline architecture (current state), define the target architecture, and create a sequencing plan to prioritize and schedule changes effectively.

Build loosely coupled systems
Loosely coupled components

Break down the system into small components.
Each component of the system can be updated separately, without affecting the other components.

Make reversible decisions

Ability to make more decisions, iterate, switch back and improve.

Plan for failure

Architecture decisions must consider strictly defined and measured failure-related metrics.

Availability - time percentage without downtimes. Reliability - probability of failure Recovery Time Objective - maximum acceptable time for a service outage. Recovery Point Objective - acceptable state after recovery, e.g., max. acceptable data loss

Architect for scalability

The system must handle large volumes of data and be capable of scaling down when necessary.
However, avoid creating overly complex cluster setups that scale up and down, as they can become unnecessarily complicated and costly.

Prioritize security
Zero-Trust Security

Data engineers must take ownership of securing the data and assume responsibility for its protection.
Zero-trust principle: never trust, always verify. This means users and devices should never be trusted by default.

Embrace FinOps

DevOps + Finance integration.
Consider the cost structures of cloud systems, such as monitoring the ongoing costs of serverless functions handling traffic.
Ask, "How can we optimize daily operations for both cost efficiency and performance?"

About

Benjamin ("Benj") Tabares Jr. is an experienced data practitioner with a strong track record of successfully delivering short- and long-term projects in data engineering, business intelligence, and machine learning. Passionate about solving complex customer challenges, Benj leverages data and technology to create impactful solutions. He collaborates closely with clients and stakeholders to deliver scalable data solutions that unlock business value and drive meaningful insights from data.