Practice Free MLA-C01 Exam Online Questions

Question #1

You are a data scientist working for a healthcare company that develops predictive models for diagnosing diseases based on patient data. Due to regulatory requirements and the critical nature of healthcare decisions, model interpretability is a top priority. The company needs to ensure that the predictions made by the model can be explained to both medical professionals and regulatory bodies. You are evaluating different algorithms in Amazon SageMaker for your model, balancing the trade-off between accuracy and interpretability. The initial trials show that more complex models like deep neural networks (DNNs) yield higher accuracy but are less interpretable, whereas simpler models like logistic regression provide clearer insights but may not perform as well on the dataset.

Given these considerations, which of the following approaches is MOST APPROPRIATE for achieving both interpretability and acceptable performance?

A . Select a deep neural network (DNN) model and use SHAP (SHapley Additive exPlanations) to provide interpretability
B . Use a tree-based algorithm like XGBoost, which offers a balance between accuracy and interpretability with feature importance
C . Choose a logistic regression model due to its high interpretability and supplement it with additional data preprocessing to improve accuracy
D . Deploy an ensemble of models including a complex model for accuracy and a simpler model for
interpretability, using model stacking in SageMaker

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

Correct option:

Use a tree-based algorithm like XGBoost, which offers a balance between accuracy and interpretability with feature importance

The XGBoost (eXtreme Gradient Boosting) is a popular and efficient open-source implementation of the gradient boosted trees algorithm. Gradient boosting is a supervised learning algorithm that tries to accurately predict a target variable by combining multiple estimates from a set of simpler models. The XGBoost algorithm performs well in machine learning competitions for the following reasons:

Its robust handling of a variety of data types, relationships, distributions.

The variety of hyperparameters that you can fine-tune.

via –

https://docs.aws.amazon.com/whitepapers/latest/model-explainability-aws-ai-ml/interpretability-versus-ex

plainability.html

XGBoost is a tree-based algorithm that naturally provides feature importance, making it easier to interpret which features are influencing the model’s predictions. This strikes a balance between achieving high accuracy and maintaining interpretability, which is crucial in healthcare applications. Therefore, this represents the best option for the given use case.

Incorrect options:

Select a deep neural network (DNN) model and use SHAP (SHapley Additive exPlanations) to provide interpretability – You can use Shapley values to determine the contribution that each feature made to model predictions. These attributions can be provided for specific predictions and at a global level for the model as a whole. For example, if you used an ML model for college admissions, the explanations could help determine whether the GPA or the SAT score was the feature most responsible for the model’s predictions, and then you can determine how responsible each feature was for determining an admission decision about a particular student.

While SHAP (Shapley values) can provide interpretability for complex models like DNNs, the explanations can be more challenging to understand for non-technical stakeholders, especially in a high-stakes environment like healthcare. This approach may not fully address the interpretability requirement. Choose a logistic regression model due to its high interpretability and supplement it with additional data preprocessing to improve accuracy – Logistic regression is highly interpretable, but it may not perform well on complex datasets compared to more sophisticated algorithms. While data preprocessing can improve its performance, there is often a significant trade-off in accuracy.

Deploy an ensemble of models including a complex model for accuracy and a simpler model for interpretability, using model stacking in SageMaker – Using an ensemble model may complicate interpretability, especially when combining different types of models. Although this approach might improve accuracy, the resulting complexity could hinder the ability to explain predictions clearly.

References:

https://docs.aws.amazon.com/prescriptive-guidance/latest/ml-model-interpretability/overview.html

https://docs.aws.amazon.com/whitepapers/latest/model-explainability-aws-ai-ml/interpretability-versus-explainability.html

https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html

https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-shapley-values.html

Question #2

When is unexplainability not acceptable?

A . When determining the result of a sports match
B . When making product recommendations
C . When explaining why a loan was declined
D . When explaining why a transaction was deemed fraudulent

Reveal Solution Hide Solution

Correct Answer: C, D
C, D

Explanation:

Unexplainability is not acceptable in scenarios where the reasons behind decisions must be clear, such as explaining loan denials or flagged transactions.

Question #3

You are responsible for deploying a machine learning model on AWS SageMaker for a real-time prediction application. The application requires low latency and high throughput. During deployment, you notice that the model’s response time is slower than expected, and the throughput is not meeting the required levels. You have already optimized the model itself, so the next step is to optimize the deployment environment. You are currently using a single instance of the ml.m5.large instance type with the default endpoint configuration.

Which of the following changes is MOST LIKELY to improve the model’s response time and throughput?

A . Change the instance type to ml.p2.xlarge and add multi-model support
B . Enable Auto Scaling with a target metric for the instance utilization
C . Switch to an ml.m5.2xlarge instance type and use multi-AZ deployment
D . Increase the instance count to two and enable asynchronous inference

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

Correct option:

Enable Auto Scaling with a target metric for the instance utilization

Amazon SageMaker supports automatic scaling (auto scaling) for your hosted models. Auto scaling dynamically adjusts the number of instances provisioned for a model in response to changes in your workload. When the workload increases, auto scaling brings more instances online. When the workload decreases, auto scaling removes unnecessary instances so that you don’t pay for provisioned instances that you aren’t using.

Enabling Auto Scaling allows the endpoint to dynamically adjust the number of instances based on actual traffic. By targeting instance utilization, the deployment can automatically scale out during peak times and scale in during low demand, improving both response time and throughput without over-provisioning. With target tracking, you choose an Amazon CloudWatch metric and target value. Auto scaling creates and manages the CloudWatch alarms for the scaling policy and calculates the scaling adjustment based on the metric and the target value. The policy adds and removes the number of instances as required to keep the metric at, or close to, the specified target value. For example, a scaling policy that uses the predefined InvocationsPerInstance metric with a target value of 70 can keep InvocationsPerInstance at, or close to 70.

via – https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling-prerequisites.html

Incorrect options:

Change the instance type to ml.p2.xlarge and add multi-model support – While changing to an ml.p2.xlarge instance type, which is optimized for GPU, could improve performance for compute-intensive models, it may not be necessary for all types of models, especially if the model is CPU-bound. Adding multi-model support may further complicate the deployment without addressing the core issue of latency and throughput. Multi-model endpoints provide a scalable and cost-effective solution to deploying large numbers of models. They use the same fleet of resources and a shared serving container to host all of your models. This reduces hosting costs by improving endpoint utilization compared with using single-model endpoints. It also reduces deployment overhead because Amazon SageMaker manages loading models in memory and scaling them based on the traffic patterns to your endpoint.

The following diagram shows how multi-model endpoints work compared to single-model endpoints.

via – https://docs.aws.amazon.com/sagemaker/latest/dg/multi-model-endpoints.html

Increase the instance count to two and enable asynchronous inference – Asynchronous inference is typically used when latency is less of a concern, which contradicts the requirements of real-time prediction. Increasing the instance count without addressing scalability could help throughput but may not effectively reduce latency.

Switch to an ml.m5.2xlarge instance type and use multi-AZ deployment – Switching to a more powerful ml.m5.2xlarge instance type and using multi-AZ deployment could improve performance, but this option mainly adds redundancy and fault tolerance rather than optimizing response time and throughput directly.

References:

https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling-prerequisites.html

https://docs.aws.amazon.com/sagemaker/latest/dg/multi-model-endpoints.html

Question #4

You are a Data Scientist working for an e-commerce company that is developing a machine learning model to predict whether a customer will make a purchase based on their browsing behavior. You need to evaluate the model’s performance using different evaluation metrics to understand how well the model is predicting the positive class (i.e., customers who will make a purchase). The dataset is imbalanced, with a small percentage of customers making a purchase. Given this context, you must decide on the most appropriate evaluation techniques to assess your model’s effectiveness and identify potential areas for improvement.

Which of the following evaluation techniques and metrics should you prioritize when assessing the performance of your model, considering the dataset’s imbalance and the need for a comprehensive understanding of both false positives and false negatives? (Select two)

A . Prioritize Root mean squared error (RMSE) as the key metric, as it measures the average magnitude of the errors between predicted and actual values
B . Utilize the AUC-ROC curve to evaluate the model’s ability to distinguish between classes across various thresholds, particularly in the presence of class imbalance
C . Evaluate the model using the confusion matrix, which provides insights into true positives, false positives, true negatives, and false negatives, allowing you to calculate additional metrics such as precision, recall, and F1 score
D . Use accuracy as the primary metric, as it measures the percentage of correct predictions out of all predictions made by the model
E . Use precision and recall to focus on the model’s ability to correctly identify positive cases while
minimizing false positives and false negatives

Reveal Solution Hide Solution

Correct Answer: C, E
C, E

Explanation:

Correct options:

Evaluate the model using the confusion matrix, which provides insights into true positives, false positives, true negatives, and false negatives, allowing you to calculate additional metrics such as precision, recall, and F1 score

The confusion matrix illustrates in a table the number or percentage of correct and incorrect predictions for each class by comparing an observation’s predicted class and its true class. The confusion matrix is crucial for understanding the detailed performance of your model, especially in an imbalanced dataset. It allows you to calculate additional metrics such as precision, recall, and F1 score, which are essential for understanding how well your model handles false positives and false negatives.

Use precision and recall to focus on the model’s ability to correctly identify positive cases while minimizing false positives and false negatives

Precision and recall are particularly important in an imbalanced dataset. Precision measures the proportion of true positive predictions among all positive predictions, while recall measures the proportion of actual positives that are correctly identified. Focusing on these metrics helps in assessing how well the model avoids false positives and false negatives, which is critical in your scenario.

via – https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-metrics-validation.html

Incorrect options:

Use accuracy as the primary metric, as it measures the percentage of correct predictions out of all predictions made by the model – While accuracy is a common metric, it is not suitable for imbalanced datasets because it can be misleading. A model predicting the majority class most of the

time can achieve high accuracy without effectively capturing the minority class (e.g., customers who make a purchase).

Prioritize Root mean squared error (RMSE) as the key metric, as it measures the average magnitude of the errors between predicted and actual values – RMSE is a regression metric, not suitable for classification problems. In this scenario, you are dealing with a classification task, so metrics like precision, recall, and F1 score are more appropriate.

Utilize the AUC-ROC curve to evaluate the model’s ability to distinguish between classes across various thresholds, particularly in the presence of class imbalance – The AUC-ROC curve is a useful tool, especially in imbalanced datasets. However, understanding the confusion matrix and calculating precision and recall provide more direct insights into the types of errors the model is making, which is crucial for improving the model’s performance in your specific context.

References:

https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-metrics-validation.html

https://docs.aws.amazon.com/machine-learning/latest/dg/binary-classification.html

Question #5

You are an ML Engineer working for a logistics company that uses multiple machine learning models to optimize delivery routes in real-time. Each model needs to process data quickly to provide up-to-the-minute route adjustments, but the company also has strict cost constraints. You need to deploy the models in an environment where performance, cost, and latency are carefully balanced. There may be slight variations in the access frequency of the models. Any excessive costs could impact the project’s

profitability.

Which of the following strategies should you consider to balance the tradeoffs between performance, cost, and latency when deploying your model in Amazon SageMaker? (Select two)

A . Choose a lower-cost CPU instance, accepting longer inference times, as the savings on compute costs are more important than minimizing latency
B . Leverage Amazon SageMaker Neo to compile the model for optimized deployment on edge devices, reducing latency and cost but with limited scalability for large datasets
C . Use Amazon SageMaker’s multi-model endpoint to deploy multiple models on a single instance, reducing costs by sharing resources
D . Implement auto-scaling on a fleet of medium-sized instances, allowing the system to adjust resources based on real-time demand, balancing cost and performance dynamically
E . Deploy the model on a high-performance GPU instance to minimize latency, regardless of the
higher cost, ensuring real-time route adjustments

Reveal Solution Hide Solution

Correct Answer: C, D
C, D

Explanation:

Correct options:

Use Amazon SageMaker’s multi-model endpoint to deploy multiple models on a single instance, reducing costs by sharing resources

Amazon SageMaker’s multi-model endpoint allows you to deploy multiple models on a single instance. This can significantly reduce costs by sharing resources among models, but it may introduce slight increases in latency due to the need to load the correct model into memory. This tradeoff can be acceptable if cost savings are a priority and latency requirements are not ultra-strict.

via – https://docs.aws.amazon.com/sagemaker/latest/dg/multi-model-endpoints.html

Implement auto-scaling on a fleet of medium-sized instances, allowing the system to adjust resources based on real-time demand, balancing cost and performance dynamically

Auto-scaling allows you to dynamically adjust the number of instances based on demand, which helps balance performance and cost. During peak times, more instances can be provisioned to maintain low latency, while during off-peak times, fewer instances are used, reducing costs. This strategy offers a flexible way to manage the tradeoffs between performance, cost, and latency.

Incorrect options:

Deploy the model on a high-performance GPU instance to minimize latency, regardless of the higher cost, ensuring real-time route adjustments – While deploying on a high-performance GPU instance would minimize latency, it may not be cost-effective, especially if the model does not require the full computational power of a GPU. The high cost might outweigh the benefits of lower latency.

Choose a lower-cost CPU instance, accepting longer inference times, as the savings on compute costs are more important than minimizing latency – Choosing a lower-cost CPU instance could lead to unacceptable delays in route adjustments, which could impact delivery times. In this scenario, optimizing latency is critical, and sacrificing performance for cost could be detrimental to the business.

Leverage Amazon SageMaker Neo to compile the model for optimized deployment on edge devices, reducing latency and cost but with limited scalability for large datasets – While Amazon SageMaker Neo can optimize models for deployment on edge devices, it is not the best fit for this scenario. Neo is more

suitable for low-latency, cost-effective deployments on devices with limited resources. In this scenario, the need for scalable, cloud-based infrastructure is more important.

References:

https://docs.aws.amazon.com/sagemaker/latest/dg/multi-model-endpoints.html

https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling.html

Question #6

You are a machine learning engineer at a financial services company responsible for maintaining a fraud detection model deployed on Amazon SageMaker. The model processes a high volume of real-time transactions and must respond with low latency to ensure a seamless customer experience. Recently, the model has experienced increased latency during peak traffic times, and there have been instances where requests were dropped due to insufficient capacity.

Which approach is the MOST EFFECTIVE for monitoring and resolving these latency and scaling issues in your ML solution?

A . Configure Amazon CloudWatch to monitor key metrics such as invocation latency, model error rates, and CPU utilization. Set up CloudWatch Alarms to automatically trigger an increase in instance count when latency exceeds a predefined threshold, and enable auto-scaling on the SageMaker endpoint to handle traffic spikes
B . Use Amazon SageMaker Model Monitor to track the performance of the model and identify data drift. Adjust the model’s hyperparameters based on the results to reduce latency and improve scalability during peak times
C . Manually monitor the model’s performance using the SageMaker console, and manually increase the instance size whenever latency is detected, to ensure the model remains responsive during peak periods
D . Implement a serverless architecture using AWS Lambda for model inference to eliminate latency
and scaling issues. Use Amazon CloudFront to cache inference results for faster response times

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

Correct option:

Configure Amazon CloudWatch to monitor key metrics such as invocation latency, model error rates, and CPU utilization. Set up CloudWatch Alarms to automatically trigger an increase in instance count when latency exceeds a predefined threshold, and enable auto-scaling on the SageMaker endpoint to handle traffic spikes

This approach uses Amazon CloudWatch to monitor critical performance metrics such as latency, error rates, and CPU utilization, which are directly related to the model’s performance and ability to scale. By setting up CloudWatch Alarms and enabling auto-scaling on the SageMaker endpoint, you can automatically adjust resources to handle traffic spikes, reducing latency and preventing dropped requests. This solution provides a proactive and automated way to address both latency and scaling issues.

Incorrect options:

Use Amazon SageMaker Model Monitor to track the performance of the model and identify data drift. Adjust the model’s hyperparameters based on the results to reduce latency and improve scalability during peak times – SageMaker Model Monitor is useful for tracking model performance and detecting data drift, but it is not directly focused on resolving latency or scaling issues. Adjusting hyperparameters may improve model performance but does not address the need for scalable infrastructure during traffic spikes.

Implement a serverless architecture using AWS Lambda for model inference to eliminate latency and scaling issues. Use Amazon CloudFront to cache inference results for faster response times – Moving to a serverless architecture with AWS Lambda can simplify scaling but may not be suitable for high-throughput, low-latency ML models that require persistent instances. CloudFront is effective for caching static content but is not typically used for caching dynamic ML inference results.

Manually monitor the model’s performance using the SageMaker console, and manually increase the instance size whenever latency is detected, to ensure the model remains responsive during peak periods

– Manually monitoring and scaling resources is inefficient and may not respond quickly enough to sudden traffic spikes. Automated monitoring and scaling are essential for maintaining performance in a real-time, high-volume environment.

References:

https://docs.aws.amazon.com/sagemaker/latest/dg/monitoring-cloudwatch.html

https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling-prerequisites.html

Question #7

What is the bias versus variance trade-off in machine learning?

A . The bias versus variance trade-off refers to the balance between underfitting and overfitting, where high bias leads to overfitting and high variance leads to underfitting
B . The bias versus variance trade-off is a technique used to improve model performance by increasing both bias and variance simultaneously to achieve better generalization
C . The bias versus variance trade-off refers to the challenge of balancing the error due to the model’s complexity (variance) and the error due to incorrect assumptions in the model (bias), where high bias can cause underfitting and high variance can cause overfitting
D . The bias versus variance trade-off involves choosing between a model with high complexity that may capture more noise (high bias) and a simpler model that may generalize better but miss important patterns (high variance)

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

Correct option:

The bias versus variance trade-off refers to the challenge of balancing the error due to the model’s complexity (variance) and the error due to incorrect assumptions in the model (bias), where high bias can cause underfitting and high variance can cause overfitting

The bias versus variance trade-off in machine learning is about finding a balance between bias (error due to overly simplistic assumptions in the model, leading to underfitting) and variance (error due to the model being too sensitive to small fluctuations in the training data, leading to overfitting). The goal is to achieve a model that generalizes well to new data.

Incorrect options:

The bias versus variance trade-off refers to the balance between underfitting and overfitting, where high bias leads to overfitting and high variance leads to underfitting – High bias leads to underfitting, not overfitting, and high variance leads to overfitting, not underfitting.

The bias versus variance trade-off involves choosing between a model with high complexity that may capture more noise (high bias) and a simpler model that may generalize better but miss important patterns (high variance) – The explanation reverses the definitions of bias and variance. High complexity leads to high variance, and simpler models typically have higher bias.

The bias versus variance trade-off is a technique used to improve model performance by increasing both bias and variance simultaneously to achieve better generalization – Increasing both bias and variance simultaneously does not improve model performance; the key is to balance them to minimize total error.

References:

https://docs.aws.amazon.com/wellarchitected/latest/machine-learning-lens/mlper-09.html

https://aws.amazon.com/what-is/overfitting/

Question #8

You are a data scientist at an insurance company that uses a machine learning model to assess the risk of potential clients and set insurance premiums accordingly. The model was trained on data from the past few years, but recently, the company has expanded its services to new regions with different demographic characteristics. You are concerned that these changes in the data distribution might affect the model’s performance and lead to biased or inaccurate predictions. To address this, you decide to use Amazon SageMaker Clarify to monitor and detect any significant shifts in data distribution that could impact the model.

Which of the following actions is the MOST EFFECTIVE for detecting changes in data distribution using SageMaker Clarify and mitigating their impact on model performance?

A . Set up a continuous monitoring job with SageMaker Clarify to track changes in feature distribution over time and alert you when a significant feature attribution drift is detected, allowing you to investigate and potentially retrain the model
B . Implement a random sampling process to manually review a subset of incoming data each month, comparing it with the original training data to check for distribution changes
C . Use SageMaker Clarify’s bias detection capabilities to analyze the model’s output and identify any disparities between different demographic groups, retraining the model only if significant bias is detected
D . Use SageMaker Clarify to perform a one-time bias analysis during model training, ensuring that the
model is initially fair and accurate, and manually monitor future data distribution changes

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

Correct option:

Set up a continuous monitoring job with SageMaker Clarify to track changes in feature distribution over time and alert you when a significant feature attribution drift is detected, allowing you to investigate and potentially retrain the model

A drift in the distribution of live data for models in production can result in a corresponding drift in the feature attribution values, just as it could cause a drift in bias when monitoring bias metrics. Amazon SageMaker Clarify feature attribution monitoring helps data scientists and ML engineers monitor predictions for feature attribution drift on a regular basis.

Continuous monitoring with SageMaker Clarify is the most effective approach for detecting changes in data distribution. By tracking feature distributions over time, you can identify when a significant shift occurs, investigate its impact on model performance, and decide if retraining is necessary. This proactive approach helps ensure that your model remains accurate and fair as the underlying data evolves.

via –

https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-model-monitor-feature-attribution-drift.html

Incorrect options:

Use SageMaker Clarify’s bias detection capabilities to analyze the model’s output and identify any disparities between different demographic groups, retraining the model only if significant bias is detected

– While SageMaker Clarify’s bias detection is useful, focusing solely on bias in the model’s output doesn’t address the broader issue of shifts in feature distribution that can impact overall model performance. Continuous monitoring is needed to detect such changes proactively.

Implement a random sampling process to manually review a subset of incoming data each month, comparing it with the original training data to check for distribution changes – Manual reviews of data can be labor-intensive, error-prone, and may not catch distribution changes in a timely manner. Automated monitoring with SageMaker Clarify is more efficient and reliable.

Use SageMaker Clarify to perform a one-time bias analysis during model training, ensuring that the model is initially fair and accurate, and manually monitor future data distribution changes – A one-time bias analysis during training helps ensure initial fairness, but it doesn’t address ongoing changes in data distribution after the model is deployed. Continuous monitoring is necessary to maintain model performance over time.

Reference: https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-model-monitor-feature-attribution-drift.html

Question #9

You are a data scientist working on a loan approval model for a bank. The model predicts whether a loan application should be approved or rejected based on various features such as income, credit score, and employment history. The bank is particularly concerned about ensuring that the model is fair and does not discriminate against any demographic group, such as age, gender, or ethnicity. To address this, you need to select the appropriate evaluation metrics to assess both the model’s performance and any potential bias.

Given these requirements, which combination of evaluation metrics and bias detection methods is MOST APPROPRIATE for ensuring fair and accurate model predictions?

A . Measure the model’s performance using accuracy and AUC-ROC, and check for bias by comparing feature distributions across demographic groups
B . Evaluate the model using F1 score and AUC-ROC, and and check for bias by comparing feature distributions across demographic groups
C . Use accuracy as the primary evaluation metric and perform feature importance analysis to ensure that the model’s decisions are driven by relevant features
D . Evaluate the model using F1 score and AUC-ROC, and assess whether the model has similar true positive rates across different demographic groups

Reveal Solution Hide Solution

Correct Answer: D
D

Explanation:

Correct option:

Evaluate the model using F1 score and AUC-ROC, and assess whether the model has similar true positive rates across different demographic groups

The F1 score balances precision and recall, making it suitable for imbalanced datasets. AUC-ROC provides a comprehensive view of model performance across different thresholds. You must also assess whether the model gives different demographic groups similar true positive rates, making it a key measure for detecting bias and ensuring fairness in sensitive applications like loan approvals.

Incorrect options:

Evaluate the model using F1 score and AUC-ROC, and and check for bias by comparing feature distributions across demographic groups

Measure the model’s performance using accuracy and AUC-ROC, and check for bias by comparing feature distributions across demographic groups

Simply comparing feature distributions across demographic groups does not directly address potential bias in model predictions. So, both these options are incorrect.

Use accuracy as the primary evaluation metric and perform feature importance analysis to ensure that the model’s decisions are driven by relevant features – Accuracy alone is not sufficient in scenarios with imbalanced data or where fairness is a concern. Feature importance analysis helps explain model predictions but does not directly address bias across demographic groups.

References:

https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-data-bias-metric-cddl.html

https://aws.amazon.com/blogs/machine-learning/learn-how-amazon-sagemaker-clarify-helps-detect-bias/

Question #10

A Machine Learning Specialist is developing a custom video recommendation model for an application.

The dataset used to train this model is very large with millions of data points and is hosted in an Amazon S3 bucket. The Specialist wants to avoid loading all of this data onto an Amazon SageMaker notebook instance because it would take hours to move and will exceed the attached 5 GB Amazon EBS volume on the notebook instance.

Which approach allows the Specialist to use all the data to train the model?

A . Load a smaller subset of the data into the SageMaker notebook and train locally. Confirm that the training code is executing and the model parameters seem reasonable. Launch an Amazon EC2 instance with an AWS Deep Learning AMI and attach the S3 bucket to train the full dataset.
B . Use AWS Glue to train a model using a small subset of the data to confirm that the data will be compatible with Amazon SageMaker. Initiate a SageMaker training job using the full dataset from the S3 bucket using Pipe input mode.
C . Load a smaller subset of the data into the SageMaker notebook and train locally. Confirm that the training code is executing and the model parameters seem reasonable. Initiate a SageMaker training job using the full dataset from the S3 bucket using Pipe input mode.
D . Launch an Amazon EC2 instance with an AWS Deep Learning AMI and attach the S3 bucket to the instance. Train on a small amount of the data to verify the training code and hyperparameters. Go back to Amazon SageMaker and train using the full dataset.

Reveal Solution Hide Solution

Correct Answer: C

1 2 3 4

Exams