Practice Free Databricks Generative AI Engineer Associate Exam Online Questions
A Generative AI Engineer is developing a patient-facing healthcare-focused chatbot. If the patient’s question is not a medical emergency, the chatbot should solicit more information from the patient to pass to the doctor’s office and suggest a few relevant pre-approved medical articles for reading. If the patient’s question is urgent, direct the patient to calling their local emergency services.
Given the following user input:
“I have been experiencing severe headaches and dizziness for the past two days.”
Which response is most appropriate for the chatbot to generate?
- A . Here are a few relevant articles for your browsing. Let me know if you have questions after reading them.
- B . Please call your local emergency services.
- C . Headaches can be tough. Hope you feel better soon!
- D . Please provide your age, recent activities, and any other symptoms you have noticed along with your headaches and dizziness.
B
Explanation:
Problem Context: The task is to design responses for a healthcare-focused chatbot that appropriately addresses the urgency of a patient’s symptoms.
Explanation of Options:
Option A: Suggesting articles might be suitable for less urgent inquiries but is inappropriate for symptoms that could indicate a serious condition.
Option B: Given the description of severe symptoms like headaches and dizziness, directing the patient to emergency services is prudent. This aligns with medical guidelines that recommend immediate professional attention for such severe symptoms.
Option C: Offering well-wishes does not address the potential seriousness of the symptoms and lacks appropriate action.
Option D: While gathering more information is part of a detailed assessment, the immediate need here suggests a more urgent response.
Given the potential severity of the described symptoms, Option B is the most appropriate, ensuring the chatbot directs patients to seek urgent care when needed, potentially saving lives.
When developing an LLM application, it’s crucial to ensure that the data used for training the model complies with licensing requirements to avoid legal risks.
Which action is NOT appropriate to avoid legal risks?
- A . Reach out to the data curators directly before you have started using the trained model to let them know.
- B . Use any available data you personally created which is completely original and you can decide what license to use.
- C . Only use data explicitly labeled with an open license and ensure the license terms are followed.
- D . Reach out to the data curators directly after you have started using the trained model to let them know.
D
Explanation:
Problem Context: When using data to train a model, it’s essential to ensure compliance with licensing to avoid legal risks. Legal issues can arise from using data without permission, especially when it comes from third-party sources.
Explanation of Options:
Option A: Reaching out to data curators before using the data is an appropriate action. This allows you to ensure you have permission or understand the licensing terms before starting to use the data in your model.
Option B: Using original data that you personally created is always a safe option. Since you have full ownership over the data, there are no legal risks, as you control the licensing.
Option C: Using data that is explicitly labeled with an open license and adhering to the license terms is a correct and recommended approach. This ensures compliance with legal requirements.
Option D: Reaching out to the data curators after you have already started using the trained model is not appropriate. If you’ve already used the data without understanding its licensing terms, you may have already violated the terms of use, which could lead to legal complications. It’s essential to clarify the licensing terms before using the data, not after.
Thus, Option D is not appropriate because it could expose you to legal risks by using the data without first obtaining the proper licensing permissions.
A team wants to serve a code generation model as an assistant for their software developers. It should support multiple programming languages. Quality is the primary objective.
Which of the Databricks Foundation Model APIs, or models available in the Marketplace, would be the best fit?
- A . Llama2-70b
- B . BGE-large
- C . MPT-7b
- D . CodeLlama-34B
D
Explanation:
For a code generation model that supports multiple programming languages and where quality is the primary objective, CodeLlama-34B is the most suitable choice. Here’s the reasoning:
Specialization in Code Generation:
CodeLlama-34B is specifically designed for code generation tasks. This model has been trained with a
focus on understanding and generating code, which makes it particularly adept at handling various
programming languages and coding contexts.
Capacity and Performance:
The "34B" indicates a model size of 34 billion parameters, suggesting a high capacity for handling complex tasks and generating high-quality outputs. The large model size typically correlates with better understanding and generation capabilities in diverse scenarios.
Suitability for Development Teams:
Given that the model is optimized for code, it will be able to assist software developers more effectively than general-purpose models. It understands coding syntax, semantics, and the nuances of different programming languages.
Why Other Options Are Less Suitable:
A (Llama2-70b): While also a large model, it’s more general-purpose and may not be as fine-tuned for code generation as CodeLlama.
B (BGE-large): This model may not specifically focus on code generation.
C (MPT-7b): Smaller than CodeLlama-34B and likely less capable in handling complex code generation tasks at high quality.
Therefore, for a high-quality, multi-language code generation application, CodeLlama-34B (option D) is the best fit.
A Generative Al Engineer is creating an LLM-based application. The documents for its retriever have been chunked to a maximum of 512 tokens each. The Generative Al Engineer knows that cost and latency are more important than quality for this application. They have several context length levels to choose from.
Which will fulfill their need?
- A . context length 514; smallest model is 0.44GB and embedding dimension 768
- B . context length 2048: smallest model is 11GB and embedding dimension 2560
- C . context length 32768: smallest model is 14GB and embedding dimension 4096
- D . context length 512: smallest model is 0.13GB and embedding dimension 384
D
Explanation:
When prioritizing cost and latency over quality in a Large Language Model (LLM)-based application, it is crucial to select a configuration that minimizes both computational resources and latency while still providing reasonable performance.
Here’s why D is the best choice:
Context length: The context length of 512 tokens aligns with the chunk size used for the documents (maximum of 512 tokens per chunk). This is sufficient for capturing the needed information and generating responses without unnecessary overhead.
Smallest model size: The model with a size of 0.13GB is significantly smaller than the other options. This small footprint ensures faster inference times and lower memory usage, which directly reduces both latency and cost.
Embedding dimension: While the embedding dimension of 384 is smaller than the other options, it is still adequate for tasks where cost and speed are more important than precision and depth of understanding.
This setup achieves the desired balance between cost-efficiency and reasonable performance in a latency-sensitive, cost-conscious application.
A Generative Al Engineer would like an LLM to generate formatted JSON from emails. This will require parsing and extracting the following information: order ID, date, and sender email.
Here’s a sample email:
They will need to write a prompt that will extract the relevant information in JSON format with the highest level of output accuracy.
Which prompt will do that?
- A . You will receive customer emails and need to extract date, sender email, and order ID. You should return the date, sender email, and order ID information in JSON format.
- B . You will receive customer emails and need to extract date, sender email, and order ID. Return the extracted information in JSON format.
Here’s an example: {“date”: “April 16, 2024”, “sender_email”: “[email protected]”, “order_id”: “RE987D”} - C . You will receive customer emails and need to extract date, sender email, and order ID. Return the extracted information in a human-readable format.
- D . You will receive customer emails and need to extract date, sender email, and order ID. Return the extracted information in JSON format.
B
Explanation:
Problem Context: The goal is to parse emails to extract certain pieces of information and output this in a structured JSON format. Clarity and specificity in the prompt design will ensure higher accuracy in the LLM’s responses.
Explanation of Options:
Option A: Provides a general guideline but lacks an example, which helps an LLM understand the exact format expected.
Option B: Includes a clear instruction and a specific example of the output format. Providing an example is crucial as it helps set the pattern and format in which the information should be structured, leading to more accurate results.
Option C: Does not specify that the output should be in JSON format, thus not meeting the requirement.
Option D: While it correctly asks for JSON format, it lacks an example that would guide the LLM on how to structure the JSON correctly.
Therefore, Option B is optimal as it not only specifies the required format but also illustrates it with an example, enhancing the likelihood of accurate extraction and formatting by the LLM.
A Generative AI Engineer received the following business requirements for an external chatbot.
The chatbot needs to know what types of questions the user asks and routes to appropriate models to answer the questions. For example, the user might ask about upcoming event details. Another user might ask about purchasing tickets for a particular event.
What is an ideal workflow for such a chatbot?
- A . The chatbot should only look at previous event information
- B . There should be two different chatbots handling different types of user queries.
- C . The chatbot should be implemented as a multi-step LLM workflow. First, identify the type of question asked, then route the question to the appropriate model. If it’s an upcoming event question, send the query to a text-to-SQL model. If it’s about ticket purchasing, the customer should be redirected to a payment platform.
- D . The chatbot should only process payments
C
Explanation:
Problem Context: The chatbot must handle various types of queries and intelligently route them to the appropriate responses or systems.
Explanation of Options:
Option A: Limiting the chatbot to only previous event information restricts its utility and does not meet the broader business requirements.
Option B: Having two separate chatbots could unnecessarily complicate user interaction and increase maintenance overhead.
Option C: Implementing a multi-step workflow where the chatbot first identifies the type of question and then routes it accordingly is the most efficient and scalable solution. This approach allows the chatbot to handle a variety of queries dynamically, improving user experience and operational efficiency.
Option D: Focusing solely on payments would not satisfy all the specified user interaction needs, such as inquiring about event details.
Option C offers a comprehensive workflow that maximizes the chatbot’s utility and responsiveness to different user needs, aligning perfectly with the business requirements.
A Generative Al Engineer is tasked with improving the RAG quality by addressing its inflammatory outputs.
Which action would be most effective in mitigating the problem of offensive text outputs?
- A . Increase the frequency of upstream data updates
- B . Inform the user of the expected RAG behavior
- C . Restrict access to the data sources to a limited number of users
- D . Curate upstream data properly that includes manual review before it is fed into the RAG system
D
Explanation:
Addressing offensive or inflammatory outputs in a Retrieval-Augmented Generation (RAG) system is critical for improving user experience and ensuring ethical AI deployment.
Here’s why D is the most effective approach:
Manual data curation: The root cause of offensive outputs often comes from the underlying data used to train the model or populate the retrieval system. By manually curating the upstream data and conducting thorough reviews before the data is fed into the RAG system, the engineer can filter out harmful, offensive, or inappropriate content.
Improving data quality: Curating data ensures the system retrieves and generates responses from a high-quality, well-vetted dataset. This directly impacts the relevance and appropriateness of the outputs from the RAG system, preventing inflammatory content from being included in responses.
Effectiveness: This strategy directly tackles the problem at its source (the data) rather than just mitigating the consequences (such as informing users or restricting access). It ensures that the system consistently provides non-offensive, relevant information.
Other options, such as increasing the frequency of data updates or informing users about behavior expectations, may not directly mitigate the generation of inflammatory outputs.
A Generative Al Engineer has already trained an LLM on Databricks and it is now ready to be deployed.
Which of the following steps correctly outlines the easiest process for deploying a model on Databricks?
- A . Log the model as a pickle object, upload the object to Unity Catalog Volume, register it to Unity Catalog using MLflow, and start a serving endpoint
- B . Log the model using MLflow during training, directly register the model to Unity Catalog using the MLflow API, and start a serving endpoint
- C . Save the model along with its dependencies in a local directory, build the Docker image, and run the Docker container
- D . Wrap the LLM’s prediction function into a Flask application and serve using Gunicorn
B
Explanation:
Problem Context: The goal is to deploy a trained LLM on Databricks in the simplest and most integrated manner.
Explanation of Options:
Option A: This method involves unnecessary steps like logging the model as a pickle object, which is not the most efficient path in a Databricks environment.
Option B: Logging the model with MLflow during training and then using MLflow’s API to register and start serving the model is straightforward and leverages Databricks’ built-in functionalities for seamless model deployment.
Option C: Building and running a Docker container is a complex and less integrated approach within the Databricks ecosystem.
Option D: Using Flask and Gunicorn is a more manual approach and less integrated compared to the native capabilities of Databricks and MLflow.
Option B provides the most straightforward and efficient process, utilizing Databricks’ ecosystem to its full advantage for deploying models.
A Generative Al Engineer interfaces with an LLM with prompt/response behavior that has been trained on customer calls inquiring about product availability. The LLM is designed to output “In Stock” if the product is available or only the term “Out of Stock” if not.
Which prompt will work to allow the engineer to respond to call classification labels correctly?
- A . Respond with “In Stock” if the customer asks for a product.
- B . You will be given a customer call transcript where the customer asks about product availability. The outputs are either “In Stock” or “Out of Stock”. Format the output in JSON, for example: {“call_id”: “123”, “label”: “In Stock”}.
- C . Respond with “Out of Stock” if the customer asks for a product.
- D . You will be given a customer call transcript where the customer inquires about product availability.
Respond with “In Stock” if the product is available or “Out of Stock” if not.
B
Explanation:
Problem Context: The Generative AI Engineer needs a prompt that will enable an LLM trained on customer call transcripts to classify and respond correctly regarding product availability. The desired response should clearly indicate whether a product is "In Stock" or "Out of Stock," and it should be formatted in a way that is structured and easy to parse programmatically, such as JSON.
Explanation of Options:
Option A: Respond with “In Stock” if the customer asks for a product. This prompt is too generic and does not specify how to handle the case when a product is not available, nor does it provide a structured output format.
Option B: This option is correctly formatted and explicit. It instructs the LLM to respond based on the availability mentioned in the customer call transcript and to format the response in JSON. This structure allows for easy integration into systems that may need to process this information automatically, such as customer service dashboards or databases.
Option C: Respond with “Out of Stock” if the customer asks for a product. Like option A, this prompt is also insufficient as it only covers the scenario where a product is unavailable and does not provide a structured output.
Option D: While this prompt correctly specifies how to respond based on product availability, it lacks the structured output format, making it less suitable for systems that require formatted data for further processing.
Given the requirements for clear, programmatically usable outputs, Option B is the optimal choice because it provides precise instructions on how to respond and includes a JSON format example for structuring the output, which is ideal for automated systems or further data handling.
What is an effective method to preprocess prompts using custom code before sending them to an LLM?
- A . Directly modify the LLM’s internal architecture to include preprocessing steps
- B . It is better not to introduce custom code to preprocess prompts as the LLM has not been trained with examples of the preprocessed prompts
- C . Rather than preprocessing prompts, it’s more effective to postprocess the LLM outputs to align the outputs to desired outcomes
- D . Write a MLflow PyFunc model that has a separate function to process the prompts
D
Explanation:
The most effective way to preprocess prompts using custom code is to write a custom model, such as an MLflow PyFunc model. Here’s a breakdown of why this is the correct approach:
MLflow PyFunc Models:
MLflow is a widely used platform for managing the machine learning lifecycle, including experimentation, reproducibility, and deployment. A PyFunc model is a generic Python function model that can implement custom logic, which includes preprocessing prompts.
Preprocessing Prompts:
Preprocessing could include various tasks like cleaning up the user input, formatting it according to specific rules, or augmenting it with additional context before passing it to the LLM. Writing this preprocessing as part of a PyFunc model allows the custom code to be managed, tested, and deployed easily.
Modular and Reusable:
By separating the preprocessing logic into a PyFunc model, the system becomes modular, making it easier to maintain and update without needing to modify the core LLM or retrain it.
Why Other Options Are Less Suitable:
A (Modify LLM’s Internal Architecture): Directly modifying the LLM’s architecture is highly impractical and can disrupt the model’s performance. LLMs are typically treated as black-box models for tasks like prompt processing.
B (Avoid Custom Code): While it’s true that LLMs haven’t been explicitly trained with preprocessed prompts, preprocessing can still improve clarity and alignment with desired input formats without confusing the model.
C (Postprocessing Outputs): While postprocessing the output can be useful, it doesn’t address the need for clean and well-formatted inputs, which directly affect the quality of the model’s responses.
Thus, using an MLflow PyFunc model allows for flexible and controlled preprocessing of prompts in a scalable way, making it the most effective method.