How to Assess Candidates for Data Engineer Jobs in a Virtual Interview

Interview Process Tips from the Experts

Interviewing candidates for data engineer jobs virtually makes assessment of candidates more difficult. You don’t have the benefit of body language and proximity to help you gauge an applicant’s performance and suitability for the role.

We’ve discussed how to interview engineers for your startup in a previous article, examining how to optimize your interviewing process with Traey Hatch and Bryan O’Guin. In response to that piece, we had a few hiring managers ask for our advice on interviewing virtually – though with similarities to in-person interviews, still something of a different ballgame.

In this article, we get the perspectives of Dr. John Aven, Director of Engineering at HashMap and Ph.D. in Computational Science, and our founder and Managing Director, George Atuahene.

Just to be clear, Dr Aven’s opinions are his own, and don’t necessarily reflect those of his employer.

What’s the Objective of the Virtual Interview?

The easy answer to this is that you’re interviewing to find the best-fit data engineers for the roles you have available. Dr. Aven takes a step-by-step approach.

“The first thing is to identify those who are really data engineers versus those who are ETL developers or software engineers,” he says. “The term ‘data engineer’ has been misused to the point where someone who has used DataStage or Informatica will call themselves a data engineer but doesn’t have the full skillset of a data engineer.”

Next, you must assess the level of the candidate’s experience. Dr. Aven continues:

“You’ll also have software engineers who have played with data modeling or Spark but don’t really have the data modeling experience. I’m looking for more senior data engineers. Someone who understands the ETL space and software engineering space. It’s harder than one might think.”

Weeding out those who are not good fits as data engineers is not always easy. “50% of the time you can quickly identify those who are not quite true data engineers, but there are many who have done some research and can fake you out,” Dr. Aven tells us.

George Atuahene says, “Data engineering is a very broad space. Our recruiters assess each position on a case-by-case basis instead of just going off the title or job description.

“After speaking with hundreds of hiring managers, we’ve seen numerous cases where a job may be titled ‘data engineer’ but has the responsibilities of a ‘data infrastructure engineer’, ‘data architect’, ‘machine learning engineer’, or some combination of those roles.”

It’s Important to Assess for Soft Skills

Your data engineers will require an array of soft skills. They must be good communicators, and a good interview will develop like a conversation between two people.

“I have an open conversation about what they’ve done and what interests them,” Dr. Aven says. “Can they simply present what they have interest in? I’m not looking for someone who can sit in a back-room and code… I want someone who can also talk to people.

“Can they present their ideas and thoughts in a way that I can follow, even if I don’t have a background in the subject?

“I also like to see what they do outside of work. It’s easy to find someone who is just a techie, but, if you’re going to be people-facing, your clients have interest outside of work too. Sometimes just being able to converse on other topics helps. Some people like to go comping. I like to bowl. Some others are professional ping-pong players – you remember those people. It also helps you to see if they can step away and let their mind refresh.”

Look for Problem-Solving Skills

Once the ice has been broken and you’re happy that the candidate has the right personality to fit in, you can start to delve a little deeper into their working methods. Dr. Aven likes to assess candidates for their ability to solve problems logically.

“I may present a data problem and see how they would solve it,” he says. “I may dive into edge technology to see how adventurous they are, or I may change a constraint in a given scenario to see how they adapt.”

And speaking of adapting, Dr. Aven has a few words of advice to help identify adaptable problem solvers.

“They may be a senior engineer but not have much breadth of experience. I shy away from people who have only worked with Spark,” he says. “I’ve found that most data engineers who have only used Spark can only think of problems within Spark. Those that only know Spark will always default to that technology. I think the industry overemphasizes Spark.”

George Atuahene believes this piece of the interview process is especially important when hiring remote workers.

“When working remotely,” George says, “any professional needs to be able to work without constant supervision and be good at independent problem-solving. Time management, proactivity, and focus are key.”

Dr. Aven agrees.

“Coming from an industry where remote has been modus operandi, and also having worked on-site, you really need to assess whether a person can work without supervision,” he remarks. “Can they work without a middle manager? That is really hard to gauge.

“You’re going to find a large pool of experienced engineers have worked remotely. The difficulty is when you have a candidate who hasn’t done it before. I don’t think you can really tell. Occasionally you’ll find that some people aren’t doing any work – then you have to figure out what to do with them.

“Most people in tech can work remotely,” he says. “They may not like it, but they will be able to do the work.”

Assessing for Hard Skills

Of course, during the interview you must assess the hard skills that the candidate possesses, and ensure they match the claims on the candidate’s resume.

Dr. Aven shies away from coding exams.

“I don’t like to do coding exams. I like to talk to people first. I want to know about what they’ve done for the past two years,” he says. “I have them walk through what they’ve been learning and, as they talk, I will dig deeper into certain topics to determine the depth of their experience. Someone without depth may have taken a Udemy course on Python but can’t provide any relevant use cases.

“After that, I would have a follow-up with a deep technical dive or give them a coding challenge. If I think the person is a little junior but has potential, then I may lean towards coding challenges. If they’re closer to the architect level, then I would have another engineer sit down with them for a deep dive on the technical side.”

Remove Hiring Bias

“I don’t believe in single-opinion hires,” Dr. Aven says. “As a hiring manager, sometimes you’re too far removed from a very specific project and don’t have all of the specific details.”

This doesn’t mean that the candidate needs to be interviewed by a second person on each occasion.

“Always have a second person talk to the person talking to the candidate as well,” Dr. Aven recommends. “Your personal biases toward a certain characteristic make a bad fit for the team or customer your hire will work with.

“I’ve seen personal bias result in very bad hires. I’ve seen people hired based on a single individual’s opinion of the candidate being very talented, but the person doing the hiring wasn’t involved with any technical projects. This ended up causing a lot of headaches for the team lead and entire team.”

Be Flexible with Your Interview Questions

The questions you ask will depend upon the role and the candidate you are questioning. You may be hiring for a specific set of responsibilities, or you may be seeking a young, hungry data engineer who may not necessarily have the experience of others but offers a world of potential.

“There are pros and cons of different approaches,” Dr. Aven believes.

“We don’t necessarily have a set of questions that we ask all engineers,” George explains. “From a technical perspective, our goal is to determine whether an engineer has the required hard skills and a track record of delivering relevant projects across the project life cycle and with significant individual involvement (including post-production support and optimization).

“Since we recruit for early-stage startups, we avoid candidates who have only worked in large teams or those that don’t have a strong understanding of design concepts.”

Dr. Aven concurs, saying that the questions he asks may change over time and depend upon who he is interviewing. He gives a couple of examples:

“With Python, if you ask somebody, ‘So you know Python… can you tell me how to execute a for loop for a specific situation?’ or ‘You’ve used Pandas and SQLAlchemy, how do you go row by row and write it?’. If the person was a Java developer, they may go straight to a for loop. A more advanced person will say they would use a list comprehension – which is a Pythonic construct to execute a for loop in a compact way. There are optimizations behind the scenes that make it more efficient.

“Let’s take a Spark engineer: ‘You have a join that you’ve done in Spark, can you do the same join in Pandas? What would it look like?’ is a question that I might ask.”

Dr. Aven says that how the candidate answers is as important as the answer itself. “You might ask, ‘When you’re doing joins in PySpark and both tables have an identical column, how do you ensure that you only have one instance of that column in your resulting data frame?’.

“The candidate should think about how to do it most efficiently. I’ll ask for alternatives and why they chose to do it the way they did. Also, why is it important to make sure there is only one column with that name?”

George says, “When speaking with engineers, we probe beyond the keywords on their resume and assess whether they understand the big picture of the project and how their role factors into that.

“We also conduct back-door references and check out their GitHub account to get an idea of what they may not be telling us.”

Tips When Interviewing Virtually

We asked Dr. Aven if he had any preferred virtual interview tools and techniques. Here’s what he told us:

  • “Zoom, WebEx, Google Meet, Microsoft Teams, etc. It’s essential to have a video conferencing tool.”
  • “Make sure who you’re hiring is who you’re talking to. You hear of stories where you interview someone, and a different person shows up on day one. It hasn’t happened to me, but I can see how people would be able to get away with that remotely. This is especially true when hiring people overseas.”
  • “Have a consolidated repository of interview notes so different interviewers can see what the other notes were. A lot of HRMS systems have that integrated, but not all of them. Don’t rely upon someone emailing you feedback – make a digital form part of the process.”

Dr. Aven also recommends keeping the hiring process as streamlined as possible.

“I see companies that do four, five, six steps or talk to people for hours,” he says, “and it becomes a long, laborious process. The longest process should be two weeks.

“Remember, good candidates can find jobs within two weeks. Long processes are not beneficial.”

Final Thoughts

We asked Dr. Aven for a few final thoughts. He didn’t disappoint, and provided these bite-sized insights:

  • “Based on my experience, knowledge of Spark and the Hadoop infrastructure is not important – because it’s more important to understand what the ETL process is, what data transformations are, and how to solve a problem given tools that you’re not familiar with. You should be able to learn new tools and use them versus just being tied into a single step.”
  • “If someone can only do SQL, that’s just as bad as someone who only knows Spark. They haven’t exposed themselves to something bigger.”
  • “I think hyper-focus on only one tech stack decreases my interest in a candidate – some people get stuck on tech stacks and can’t be open a new one.”
  • “The larger the company, the more people are given random titles – like a data integration engineer versus a data engineer. An inexperienced hiring manager may look at those titles and think they’re the same thing, but one person may not actually be building the pipeline.”
  • “I find that some people want to hire a data engineer, but actually need an ETL developer, ML engineer, or data architect. There is a lot of title confusion in the data space. If it’s a startup and you have an engineer who built the front-end, that engineer may have scoped out what they think a data engineer is but may not know what they actually need.”

For expert help in defining your hiring needs and finding the talent, experience, and cultural fit that your company needs when hiring for data engineers, get in touch with Kofi Group today.