April 7, 2024 - last updated
GenAI Leadership

Streamlining GenAI projects: A data science leader’s guide

Shahar Abramov
Shahar Abramov

Shahar is enthusiastic about creating AI-enabled software. He is co-founder and CTO at Really Great Tech (RGT), a global software development company, dedicated to being at the forefront of modern web, AI, and Web3.

9 min read Feb 20, 2024

Managing GenAI projects has quickly become a whole new beast. We already know how to lead, innovate, and iterate effectively on “classic” software projects. However, product development that relies on generative AI remains a relatively new territory and requires its own stable methodologies to reduce risk and have more predictable iterations.

Over the next few minutes, we’ll go step-by-step on leading high-performing GenAI teams and making the development of GenAI-backed software more streamlined. Even though our examples here are mostly focused on LLMs (Large Language Models), the concepts are relevant to all kinds of generative AI, be it image generation, audio, or multimodal AI.

Let’s go!

Align teams on GenAI projects 

Effective team alignment is essential in GenAI projects involving LLM-based applications. It’s about ensuring everyone understands how these systems work and their practical implications. Explaining technical terms like “context window” and “RAG” to non-technical teammates is part of this. However, the real goal is a comprehensive grasp of how LLMs function, their strengths, and their limitations.

For instance, your Product Manager might think it’s as simple as feeding the LLM loads of data and giving it a task. But you know that throwing in a lot of noise can lead to poor performance and that you have to establish high-quality retrieval.

Every team member must get familiar with the basics of LLM development. This way, everyone can make informed decisions, and avoid miscommunications or unrealistic expectations.

Adopt an iterative approach for GenAI projects

LLMs took the world by storm. However, while it might seem like LLMs suddenly simplify the creation of generative AI-powered products like never before, developing production-ready LLM-based solutions is a complex endeavor. Given the multitude of users and potential edge cases, the likelihood of encountering errors, hallucinations, prompt injections, and other undesired behavior is nearly inevitable.

It’s important, then, to adopt an iterative approach in your development process.

In some cases, you may prefer to hold on to a powerful feature and release a more limited version first. Your users will likely appreciate a stable, predictable product that consistently performs well, even if it only addresses a very specific problem. This approach can significantly enhance user satisfaction as they enjoy a reliable tool that works 95% of the time. Trying to solve end-to-end problems from day one can lead to confused and frustrated users, potentially driving them away after just a few attempts.

Let’s imagine your company’s main product is a recruiting/HR platform like Greenhouse or Lever. You could implement an AI screening tool that reviews resumes and candidate profiles to automatically recommend the most qualified people for open positions. However, the risk of bias and incorrect screening is high if such a powerful tool is unleashed without oversight.

Instead, you may want to focus the first version only on initial resume scoring based on hard skills and experience. While still applying GenAI for language processing, bias risks are reduced by avoiding subjective assessments of soft skills or cultural fit. As your models improve with more diverse training data and feedback loops, the AI’s scope can expand from scoring into recommendations under human supervision. Taking an incremental approach allows automation where possible while ensuring human accountability over high-impact decisions.

Control AI Risks in GenAI Projects

Regardless of how you choose to roll out your AI application, it’s important to know exactly what happens once it’s out in the wild.

Setting up robust AI control from the start will save you many headaches down the road. This includes mitigating hallucinations and managing GenAI risks, tracking costs, usage, query types, and the relevance of responses. It also requires ensuring you understand user behaviors, model weaknesses, and high-value answers.

Solutions like Aporia Guardrails combine these capabilities with proactive AI security to secure your AI interactions. Seamlessly layered between your LLM/API to your GenAI project’s interface, guardrails work behind the scenes filtering responses and preventing risks to generative AI apps in real-time and based on your needs. 

Use Mocks to streamline GenAI development

Different teams have different approaches to how they manage their projects. But nearly every software team knows it’s inevitable to use mocks and placeholders. When working on any new system or feature, you will often be dependent on certain components that are not yet functional or reliable, and you will have to mock them until they are ready.

When delivering AI-backed software, it’s not that different, although sometimes this principle is forgotten. One of the data scientists might work meticulously to fine-tune a model that would give just the right output. Meanwhile, another engineer might want to start working on the end-to-end flow of the system, which relies on that model. So, one of them can just set up a mock API that will later be replaced with real predictions.

Mocking is not only effective for faster development, but it is also an important part of the evaluation process. Depending on how complex your pipelines and chains are, it might be better practice to split your flow into separate components and test each component separately with mocked data. Then, test the flow as a whole.

Apply separation of concerns 

Again, just a standard software design principle that is not yet well-established in teams working on GenAI products. One possible reason for this might be the wide adoption of LLM frameworks like LangChain, which are extremely powerful but are also tempting to misuse or overextend. As a developer, you might get eager to cram all of your LLM logic into one monolithic controller.

Another reason is the lack of clear standards and examples. For many developers, especially those new to GenAI, the separation of concerns is a concept that is understood in theory but challenging to implement in practice. This often leads to a mix of application logic with LLM logic or config, creating a spaghetti code that’s difficult to untangle.

Let’s examine one possible approach for solving this and delve into how we can effectively break down the main stages most LLM pipelines go through, ensuring a clean, modular approach that fosters innovation and maintains agility.

Optimizing the GenAI development pipeline

On a very high level, most LLM flows go through a few simple stages. It usually comes down to some form of:

  1. Input processing
  2. Retrieval
  3. Prediction

Ideally, you would have one or several teammates dedicated to each of these stages, creating APIs for each other to consume.

Considering these high-level stages, we can further break it down into various components depending on the task.

Dedicated attention to each of these parts in the pipeline may sound like overkill at first, and it might be if your use-case is very simple. But for robust AI applications, it quickly becomes necessary. Let’s understand why by zooming in to the stages.

Stage 1 – Input processing

The first stage may range from just basic input validations, all the way to complex transformation and NLP operations. This might include:

  1. Data transformation, cleanup, or formatting
  2. Sensitive data obfuscation & censoring
  3. Passing input through another AI chain, e.g. for content-filtering
  4. Calling your retrieval API to fetch context

Stage 2 – Retrieval

Designing a high-quality retrieval pipeline is a challenging and important task. Inaccurate retrieval can result in hallucinations and incorrect outputs. This stage includes:

  1. Data processing that serves your use-case (Chunking? Splitting? Map-Reduce?)
  2. Creating embeddings
  3. Testing and evaluating the retrieval
  4. Ranking (or Reranking) the retrieved documents

Stage 3 – Prediction

Teammates focused on the last stage might be in charge of:

  1. Prompt engineering
  2. Prompt evaluations
  3. Model training or fine-tuning
  4. Hallucinations mitigation
  5. Guardrails
  6. Model deployment, versioning, tooling and observability

The whole flow

So a full-fledged pipeline might end up looking more like this:

When zooming into the process, we find many distinct challenges that require special attention. Just look at how many things we have to do before even calling our model in stage 3!

So it’s a good idea to split your pipeline to various APIs that will allow each teammate to focus on delivering the most high quality result while keeping your infrastructure super maintainable in the long run.

Collaborative prototyping

Getting regular input from both business and technical stakeholders is vital for building usable AI products. We need feedback not just on technical feasibility but real-world usability. We can facilitate this via interactive prototyping sessions.

Get product managers, UX experts, and engineers together to engage with early prototypes. The definition of “prototype” here is flexible – choose formats providing the most meaningful feedback for each development stage.

Initially, sessions may only evaluate various prompts and outputs together, comparing them to the ideal desired behavior. This enables gathering feedback and potential test cases early on and without a complex setup.

As capabilities advance, leveraging tools like Flowise to create editable click-through prototypes enables stakeholders to visualize and test flows hands-on. Other than being a great solution for quick prototyping, it also makes it much easier to explain your process to all stakeholders. You can then use the session to review the flow together, let everyone experiment, and tweak things on the fly.

These are just a few examples of how you can achieve quick wins with relatively low effort. Adopting frequent collaborative prototyping, focused on priority issues or features, can accelerate development dramatically. The upfront alignment and shared understanding enable more decentralized decision-making down the road.

Key takeaways for leading GenAI projects

As we have explored, successfully managing advanced GenAI projects requires adopting new methodologies tailored to this emerging field. Core focus areas include:

  • Ensuring full team alignment on concepts and terminology.
  • Controlling your AI and taking an iterative approach that focuses on reliability before expanding scope.
  • Leveraging modular design for efficient development
  • Conducting regular collaborative sessions. 

By establishing these foundational practices, leaders can set their teams up for effective and fast iteration, and ultimately the delivery of impactful AI-powered products.

As the space continues maturing, we will keep refining the playbooks for running high-velocity GenAI teams. For now, concentrating on these areas will lead to major improvements in stability and velocity.

Get a demo of Aporia to control your AI.

Green Background

Control All your GenAI Apps in minutes