top of page
  • Linkedin
  • GitHub

A Guide to Choosing the Right Tools and Technologies for Data Architecture

Feb 27

3 min read

1

41

0


Choosing the right tools
Choosing the right tools

In the previous post, we explored a framework for translating data requirements into a robust data architecture (If you haven’t had a chance to read it yet, you can check it out [here]). We emphasized the importance of gathering the right requirements, a crucial step not only in data engineering but also in product development and management.

In this post, I’ll focus on selecting the right tools and technologies to build and maintain a sustainable architecture.


 
  1. Location (On-Premises / Cloud)

Company owns and maintains the hardware and software for their data stack.

  • Provisioning

  • Maintaining

  • Updating

  • Scaling

  • Cloud provider is responsible for building and maintaining the hardware in data centers

  • You rent the compute and storage resources

  • Easily scale to meet demand or scale back down to save on costs when you don't need it

The industry is shifting towards Cloud-based systems due to their flexibility and scalability. However, some companies choose or are required to keep certain data systems on-premises due to business needs, regulations, or security and privacy concerns.
 
  1. Monolithic vs. Modular Systems

Tightly-Coupled Components
Tightly-Coupled Components

Monolithic systems are self-contained systems that are made up of tightly coupled components.

  • Simplicity - one technology and typically one principal programming language

  • Easy to reason about and understand

  • Hard to maintain - if you need to update one component, you may have to update other components as well (oftentimes, a whole application has to be re-written)


Loosely-Coupled Components
Loosely-Coupled Components

Modular systems consist of loosely coupled components, a key principle of effective data architecture, as discussed [here].


  • Interoperability - allows data processing tools to integrate easily with others in the data engineering life cycle. For example, data in Parquet format can be paired with any tool that supports it.

  • Flexible & reversible decisions

  • Continuous improvement

In software development, the rise of microservices has led to the emergence of truly modular systems. Instead of combining components from multiple services into a single deployable unit, each microservice is deployed independently, allowing for greater flexibility and scalability.
 
  1. Cost Optimization

Building on-premise data systems typically incurs high CapEx costs, whereas the shift to cloud-based systems allows many companies to build with little to no CapEx.
Choosing data stack A means committing to its technologies and excluding others, resulting in the opportunity cost of missing out on alternative stacks. If stack A proves optimal, the opportunity cost is minimal. However, data technologies evolve rapidly, and components of stack A may become obsolete, incurring costs to switch. To minimize total cost of ownership, build flexible, loosely coupled systems that can adapt to changing needs. Separate immutable technologies (e.g., object storage, SQL) from transitory ones (e.g., stream processing, AI) to future-proof your data architecture.
FinOps (also a a key principle of effective data architecture, as discussed [here]) focuses on minimizing data system costs (TCO and TOCO) while maximizing revenue potential. This can be achieved by selecting cloud services with a flexible, pay-as-you-go model and modular options for quick iteration and growth.
As data engineer, our job is to provide a positive return on investment the organization makes in its data systems.
 
  1. Build vs. Buy

    Build or buy
    Build or buy

Build Your Own Solution

  • Get exactly the solution you need

  • Avoid licensing fees

  • Avoid being at the mercy of a vendor


Use Existing Solution

  • Open-Source (community)

  • Commercial Open-Source (vendor)

  • Proprietary Non-Open-Source



Building technologies from scratch when off-the-shelf solutions are available can be akin to reinventing the wheel. , i.e., "undifferentiated heavy lifting"—a labor-intensive task that likely doesn't provide significant value to the organization.
 
  1. Server vs. Container vs. Serverless


Server

You set up and manage the server (e.g., EC2 instance)

  • Update the OS

  • Install/update packages

  • Patch software

  • Networking, scaling, and security


Container

Modular unit that packages code and dependencies to run on a server (e.g., Kubernetes)

  • Lightweight & portable

  • You set up the application code and dependencies

Use a container if your application cannot operate within the imposed limits, such as execution frequency, concurrency, or duration.

Serverless

You don't need to set up or maintain the server (e.g., AWS Lambda)

  • Automatic scaling

  • Availability & fault-tolerance

  • Pay-as-you-go

Best for simple & discrete tasks - expensive in a high event rate environment. For a more flexible approach, consider creating and deploying a Docker image for your Lambda function. You can learn more about this process [here].


 







About

Benjamin ("Benj") Tabares Jr. is an experienced data practitioner with a strong track record of successfully delivering short- and long-term projects in data engineering, business intelligence, and machine learning. Passionate about solving complex customer challenges, Benj leverages data and technology to create impactful solutions. He collaborates closely with clients and stakeholders to deliver scalable data solutions that unlock business value and drive meaningful insights from data.


Comments

Commenting on this post isn't available anymore. Contact the site owner for more info.

Send a Message

Thanks for submitting!

benjamintabaresjr.com is a business intelligence and data engineering independent consultancy that helps businesses transform their data into actionable insights.

Philippines

© 2025 benjamintabaresjr.com. All rights reserved.

Designed and secured by Wix

bottom of page