Building the future of AI-powered retail starts with trust

Jiazhen Zhu leads the end-to-end data team at Walmart Global Governance DSI, a diverse group of engineers and data scientists united to create a better platform through data-driven decisions and products. Zhu first joined Walmart Global Tech in 2019 to oversee both data engineering and machine learning, giving him a unique perspective on the interrelated worlds of DataOps, MLOps and science. Datas. Prior to Walmart Global Tech, Zhu was a data scientist and software engineer at NTT Data.

Can you briefly introduce yourself and describe your role at Walmart Global Tech?

Currently, I am a Senior Data Engineer and Machine Learning Engineer at Walmart Global Tech. I work with data end-to-end, starting with where we get the data to how we clean the data, transfer the data, feed the data into model training, and then finally move the models in the production layer. I enjoy overseeing this process and bring a decade of experience working in both data and machine learning, building platforms for both.

What has your professional background been like so far?

After completing my bachelor’s degree in computer science, I worked as a software engineer at Citi, focusing on the data warehouse used to build models and support scalable data modeling. Then I completed a master’s degree in data science and worked as both a software engineer and a data scientist. This is all interrelated because data engineering and machine learning engineering are really just part of software engineering – usually the software engineer will focus on the application or the user interface or the tasks comprehensive, while Machine Engineer and Data Engineer are more focused on data and model, respectively.

How does Walmart Global Tech fit into the whole of Walmart?

Walmart Global Tech works on cutting-edge technologies that create unique and innovative experiences for our associates, customers and members of Walmart, Sam’s Club and Walmart International. We solve the myriad of challenges every retailer faces, whether dealing with suppliers, distribution, orders, innovation, shopping experience or after-sales service. What all of these elements have in common is that they all benefit from technology.

You oversee both data engineering and machine learning – any lessons for others in terms of the benefits of structuring the organization this way? It should give you a unique perspective on data-centric AI, e.g. your recent blog.

In other companies, these functions are often separated in different organizations. My own experience is that if we can combine the different roles – especially data scientists, researchers, data engineers, machine learning engineers and software engineers – in one team, it can accelerate the development of products. Since most areas require specialist knowledge, combining many different roles on one team can also help bring new, innovative ideas to the product.

What do you think of the build-versus-buy calculus as it relates to ML platforms?

For MLOps platforms, which is obviously a new area, it varies – it’s not as simple as saying we have a tech stack that we follow every time. What we do is approach those decisions on an as-needed basis – then we make sure that each component will be easy to replace or rebuild, so that we don’t have to rebuild everything just because one component doesn’t more suited to our needs.

What types of models is Walmart Global Tech deploying in production and why?

It depends on the area, requirements and end customers. Initially, I always start with the question: do we need machine learning to solve this problem, or is there an easier way to solve it that we should implement instead? If machine learning is needed, it’s often much easier and better to choose a simple model like regression or linear regression to ensure good performance. We exploit these types of models for the base cases. When there is a good existing model to use, we often adapt or use it – like BERT for natural language processing.

I want to emphasize that for the model itself, trust is essential. Not everyone will trust the model. That’s why I said at the beginning that the simplest is often the best. Not using machine learning – or if you need to use machine learning, leveraging a model that offers easier explanations like linear regression models – is better. The black box nature of BERT or deep learning makes it more difficult to help people or customers understand the model.

Ultimately, if customers or people don’t trust the model, it’s useless. It is therefore essential to build a process to explain the model. It is also important to troubleshoot the model itself.

Think model explainability and the ability to trust a model’s decisions are really important to your team?

Yes, it is important not only for the model but also for the product and its customers. If you can explain a model to a customer or user, you can also explain it to yourself. So it’s also a win-win. Nobody likes a black box.

What is your strategy for model monitoring and model performance management?

Since change always happens, monitoring is really the key to successful MLOps. Whether from a data engineering or machine learning engineering perspective, we are always tasked with monitoring all pipeline or infrastructure processes. The data engineer, for example, will review whether there are any data quality issues, data mismatches, missing data, etc.

For machine learning, monitoring covers both the data and the model itself. We watch data driftconcept drift and performance on key indicators (i.e. CSA) to get to the bottom of things and shed light on the recycling process. There’s a lot of stuff you can track, so having access to key metrics for root cause analysis and getting notifications and alerts is really helpful.

It must be a really interesting time at Walmart considering record demand, supply chain challenges, inflation and more. Have you encountered any interesting issues with production models reacting to a new environment?

Definitely yes. The only constant is that the data is constantly changing. A model trained on social media data, for example, can see significant impacts on model performance when social media data changes drastically or disappears overnight. Problems like these are very common.

Half of the data scientists we recently interviewed (50.3%) say their business counterparts don’t understand machine learning. How did you manage to overcome this obstacle to evolve your ML practice?

This kind of situation is common in the industry. As noted, some models are black boxes – and few trust black boxes that are not open, which is why explainability is so important. If your clients look at it and understand why a model made a particular decision, trust will increase over time.

For models that directly impact customers, how do you incorporate customer feedback into your models?

Customer feedback is so important. The data may change or the concept may change, but if customer feedback is part of the ML process, we can use that customer data to retrain the model in near real time and have better model performance and better ability to predict the reality as a result. Having that human process in the loop to check things out can help ensure models are relevant and performing well.

What is your most and least favorite part of your current role?

I love data and I enjoy playing with data, so that’s definitely one of my favorite things about my current role. Incidentally, this is also one of the hardest parts of the job because you need to know the data well before putting it into a model. In terms of machine learning, one of the hardest things is how to choose the right approach – not just for the model, not just for the data, but also the tech stacks, scalability, and all that. the ML pipeline.

Leave a Comment