In this roundtable discussion, Yong and George shared valuable insights on the data science life cycle. This conversation shed light on various stages of the life cycle, challenges faced, and strategies adopted to generate value in their specific use cases. The discussion also touched on resource allocation, the role of platforms, and the potential for automation to accelerate the data science process.


The conversation began with an exploration of the data science life cycle. Yong initiated the discussion by posing the question, “What stages are included in the data science life cycle?” This led to an in-depth conversation about the different phases and tasks involved, ranging from data collection and cleaning to model deployment and continuous evaluation.

Challenges and Strategies

One of the key challenges highlighted by the speakers was the issue of MLOps (Machine Learning Operations). George emphasized the importance of deploying and monitoring computer vision models, particularly in detecting and addressing performance degradation and various types of drift. Yong shared his insights on data quality issues and the continuous evaluation of model performance given the vast amount of unseen data.

The speakers also stressed the crucial role of human labelers in the data science life cycle. They discussed the challenges of maintaining consistent quality in human labeling, especially when utilizing an overflow team from a third party. Creating strategies to audit the accuracy of human labelers emerged as a crucial aspect of ensuring reliable results.

Value Generation Stage

When discussing the most value-generating stage, Yong and George unanimously agreed that data quality played a crucial role. They emphasized the significance of addressing data cleanliness, integration, and exploration to enhance the accuracy of the models. Yong further highlighted the need to adapt and optimize models based on different data characteristics and diverse user requirements.

Resource Allocation

The conversation then shifted to resource allocation, including time, people, and platforms necessary to support each stage of the data science life cycle. Yong emphasized the importance of allocating resources to continuously monitor and automate processes. He suggested starting with time-boxed efforts in the early stage of projects and gradually investing in automation as they matured. Both speakers agreed on the value of having a unified platform and a mature community support system for effective resource allocation.

Accelerating the Data Science Life Cycle

When asked about their wishlist for accelerating or shortening the data science life cycle, George emphasized the significance of having the entire data science and machine learning stack in one of the public clouds. He acknowledged the rapid advancements in tooling and automation offered by public cloud services, which have the potential to streamline the data science process and help organizations achieve faster, more accurate results.


The roundtable discussion between Yong and George provided valuable insights into the data science life cycle. Their experiences and expertise shed light on the challenges and opportunities faced in different stages of the process. By addressing data quality, optimizing resource allocation, and leveraging advanced tooling, organizations can accelerate their data science initiatives and unlock the full potential of machine learning.

This conversation highlights the evolving nature of the data science field and emphasizes the importance of continuous learning and adaptation. As the industry matures and new tools and platforms emerge, data scientists and machine learning practitioners must stay informed and embrace the opportunities for growth and optimization offered by automation and advanced technologies.

In conclusion, the data science life cycle is a dynamic and multifaceted process that requires careful planning, resource allocation, and continuous evaluation.

Disclaimer: All guests’ views are their own and do not represent their employers’.