Data engineering: going beyond the tech

Data Engineering: Going Beyond the Tech

[English] A look into the soft skills required to succeed as a data engineer in the enterprise.

27 November 2019

Promoting, improving and enabling data access in an enterprise is one of the Data Engineer’s most important and overarching roles today.

The backend of this Data Platform is a beast in its own right. But today I will be writing about the world beyond the technical components of a Data Engineering Role and how to be successful in this domain.

I will be sharing some of the insights I have gained in my 7 years in various Data Engineering related roles regarding integration into the Enterprise. This blog will focus on the important  soft skills Data Engineers need to be effective and add value.

Data Engineering in the Shadows

Many of a Data Engineer’s core duties take place behind the scenes and normally stakeholders in businesses are not even aware of the underlying blood, sweat and tears that went into producing a particular “dataset” or “insight”.

The back-end aspect of this role is not explored in great detail in this write-up. This aspect is hugely important nonetheless and deserves attention.

Instead, we will focus a little on the customer facing side and assume that data of good quality is being delivered to some environment from where it can be consumed by various tools, data workers and systems.

So who is our Customer really?

Well, it is any individual in the company that has a need around some form of Data Access.

This can take many shapes but below are some of the more notable and established roles. This is to illustrate examples of who our customer really is and should in no way be seen as a complete list.

In some cases we have Project Managers that will manage the priorities within a team and attempt to steer us towards centralised goals and strategies.

In most instances, there are knowledgeable Analysts and Data Scientists consuming the data and perhaps some dedicated Business Analysts looking after stakeholder-facing tasks. The people in these roles normally provide a natural transition / interface into Business and all the different departments and individuals.

We need to give and do more than just Data!

The challenge in Data Engineering is that our roles are about so much more than just landing data in a consumable space. We still need to play a major role in every part of each dataset’s journey through the Enterprise, else it will get wrangled and hammered out of shape as I have seen happen all too often.

We are funnels, same as Analysts. Data Engineers should be funnelling data from various sources, data models and in various formats and levels into comprehensible manageable datasets, normally in some accessible centralised platform. Analysts and Scientists are the next stage of the funnel. They take this a step further by consuming the (somewhat) prepared data and obtaining insights on whichever level necessary – past, present and future.. They also need to convey these insights to the stakeholders in the business in an effective manner.

It would be impossible to convey all of this knowledge to each employee in a company, hence the need for funnels, interpreters and broadcasters.

For this exact reason and many more, I find it an art to deliver value through data on any level in Business.

Below I provide some key skills, concepts and conditions I find of essence when bringing data to stakeholders and for them to extract tangible value from it.

Orange_Quarter_Data_Engineering

1. Data Quality

Consumers must be confident in the data at all times.

This is something that is developed naturally through time, but there are things we can do to help this along. The data should speak for itself, but if you are confident in it then others should get the feeling too!

Once you have identified a key metric in your business that encapsulates a large part of your data, capture it (autonomously and with sensible frequency), and compare it.

You might be telling the stakeholders your data is of good quality, but what really matters is if they can conclude it themselves. The whole concept of Data about Data (or metadata) is actually really important when attempting to inspire confidence initially – so show them that your data is of good quality.

2. Transformations and Business Logic

Once confidence is obtained, there will be a deluge of questions that need answering and reports that need to be built.

I’m assuming you have landed all your raw data and you have some clever way to process it all into a structured, perfectly formatted table.

Most of the time this is not the end of the journey for data.

The raw data, even in structured format lacks a few things. It may not take into consideration Data Lineage, Mutations and Historical definition changes (even within field) that have taken place throughout a product or application’s lifetime.

Be prepared to do some fancy footwork around data models in general if you are trying to form certain generic views or metrics in a final layer ready for consumption on an automated basis.

This is one of the big challenges in data environments. Forming that coherent view across time on a changing business and ever evolving datasets.

Make sure to document the final layers of the data architecture well. This is important for future generations of data workers as well as forming that base understanding and reference on the many available datasets. There can be multiple versions of this exposed to different users in the business.

3. Cultivating the knowledge culture

In general I trust in the principle of the more you know, the better.

I have observed that this is not a standard mindset  when it comes to data. People tend to avoid complexities they need not be involved in which is understandable  considering the fast pace of today’s professional culture.

In saying this, if you would like to generate insights from data, you need at least some form of understanding around what is available for analysis. A big part of this could be fulfilled with a little curiosity and exploration into  the non-technical user side.

This is why I promote a base level of cross-functional knowledge sharing. The end-user on an Application level needs to also have a view on the underlying complexities and challenges else the output of a Data Engineer and Data Team in general will not meet expectations. Same with the backend developer of the Application itself.

When big things are happening in terms of infrastructure or interesting data sets become available from an Infrastructure and backend level, be sure to share this knowledge with other teams to gain exposure throughout business. This is what makes a Data Team valuable and sustainable into the future.

 

4. Be open-minded and have conversations

You need to be willing to put yourself out there and have a conversation or two with a few people around you.

Don’t hide behind your wall of Data and endless spreadsheets and busy schedules. Get out there and try to get a feel for what the business is really about and what makes it tick.

 

To conclude

As a Data Engineer, there are many approaches to take in order to bring data closer to business stakeholders. Thankfully, this is a concept that enterprises are attempting to refine and improve on drastically because there is a known disconnect between the two.

In this article, I shared with you some of the biggest (non-technical) challenges I have come across when attempting to bridge the gap between Data and the Enterprise, and suggested some of the initiatives that could be taken  to resolve them.

I specifically focused on high-level, people driven concepts because I believe that in today’s day and age, anyone can move data around and transform from point A to point B.

The biggest challenge lies in resolving the complexities around people interacting with and interpreting data on differing social and hierarchical levels, and the conclusions they draw based on this.