Practice Free DAS-C01 Exam Online Questions
A company’s marketing team has asked for help in identifying a high performing long-term storage service for their data based on the following requirements:
The data size is approximately 32 TB uncompressed.
There is a low volume of single-row inserts each day.
There is a high volume of aggregation queries each day.
Multiple complex joins are performed.
The queries typically involve a small subset of the columns in a table.
Which storage service will provide the MOST performant solution?
- A . Amazon Aurora MySQL
- B . Amazon Redshift
- C . Amazon Neptune
- D . Amazon Elasticsearch
A company has a business unit uploading .csv files to an Amazon S3 bucket. The company’s data platform team has set up an AWS Glue crawler to do discovery, and create tables and schemas. An AWS Glue job writes processed data from the created tables to an Amazon Redshift database. The AWS Glue job handles column mapping and creating the Amazon Redshift table appropriately. When the AWS Glue job is rerun for any reason in a day, duplicate records are introduced into the Amazon Redshift table.
Which solution will update the Redshift table without duplicates when jobs are rerun?
- A . Modify the AWS Glue job to copy the rows into a staging table. Add SQL commands to replace the existing rows in the main table as postactions in the DynamicFrameWriter class.
- B . Load the previously inserted data into a MySQL database in the AWS Glue job. Perform an upsert operation in MySQL, and copy the results to the Amazon Redshift table.
- C . Use Apache Spark’s DataFrame dropDuplicates() API to eliminate duplicates and then write the data to Amazon Redshift.
- D . Use the AWS Glue ResolveChoice built-in transform to select the most recent value of the column.
A machinery company wants to collect data from sensors. A data analytics specialist needs to implement a solution that aggregates the data in near-real time and saves the data to a persistent data store. The data must be stored in nested JSON format and must be queried from the data store with a latency of single-digit milliseconds.
Which solution will meet these requirements?
- A . Use Amazon Kinesis Data Streams to receive the data from the sensors. Use Amazon Kinesis Data Analytics to read the stream, aggregate the data, and send the data to an AWS Lambda function. Configure the Lambda function to store the data in Amazon DynamoDB.
- B . Use Amazon Kinesis Data Firehose to receive the data from the sensors. Use Amazon Kinesis Data Analytics to aggregate the data. Use an AWS Lambda function to read the data from Kinesis Data Analytics and store the data in Amazon S3.
- C . Use Amazon Kinesis Data Firehose to receive the data from the sensors. Use an AWS Lambda function to aggregate the data during capture. Store the data from Kinesis Data Firehose in Amazon DynamoDB.
- D . Use Amazon Kinesis Data Firehose to receive the data from the sensors. Use an AWS Lambda function to aggregate the data during capture. Store the data in Amazon S3.
An event ticketing website has a data lake on Amazon S3 and a data warehouse on Amazon Redshift.
Two datasets exist: events data and sales data. Each dataset has millions of records.
The entire events dataset is frequently accessed and is stored in Amazon Redshift. However, only the last 6 months of sales data is frequently accessed and is stored in Amazon Redshift. The rest of the sales data is available only in Amazon S3.
A data analytics specialist must create a report that shows the total revenue that each event has generated in the last 12 months. The report will be accessed thousands of times each week.
Which solution will meet these requirements with the LEAST operational effort?
- A . Create an AWS Glue job to access sales data that is older than 6 months from Amazon S3 and to access event and sales data from Amazon Redshift. Load the results into a new table in Amazon Redshift.
- B . Create a stored procedure to copy sales data that is older than 6 months and newer than 12 months from Amazon S3 to Amazon Redshift. Create a materialized view with the autorefresh option
- C . Create an AWS Lambda function to copy sales data that is older than 6 months and newer than 12 months to an Amazon Kinesis Data Firehose delivery stream. Specify Amazon Redshift as the destination of the delivery stream. Create a materialized view with the autorefresh option.
- D . Create a materialized view in Amazon Redshift with the autorefresh option. Use Amazon Redshift Spectrum to include sales data that is older than 6 months.
A company receives datasets from partners at various frequencies. The datasets include baseline data and incremental data. The company needs to merge and store all the datasets without reprocessing the data.
Which solution will meet these requirements with the LEAST development effort?
- A . Use an AWS Glue job with a temporary table to process the datasets. Store the data in an Amazon RDS table.
- B . Use an Apache Spark job in an Amazon EMR cluster to process the datasets. Store the data in EMR File System (EMRFS).
- C . Use an AWS Glue job with job bookmarks enabled to process the datasets. Store the data in Amazon S3.
- D . Use an AWS Lambda function to process the datasets. Store the data in Amazon S3.
A company launched a service that produces millions of messages every day and uses Amazon Kinesis Data Streams as the streaming service.
The company uses the Kinesis SDK to write data to Kinesis Data Streams. A few months after launch, a data analyst found that write performance is significantly reduced. The data analyst investigated the metrics and determined that Kinesis is throttling the write requests. The data analyst wants to address this issue without significant changes to the architecture.
Which actions should the data analyst take to resolve this issue? (Choose two.)
- A . Increase the Kinesis Data Streams retention period to reduce throttling.
- B . Replace the Kinesis API-based data ingestion mechanism with Kinesis Agent.
- C . Increase the number of shards in the stream using the UpdateShardCount API.
- D . Choose partition keys in a way that results in a uniform record distribution across shards.
- E . Customize the application code to include retry logic to improve performance.
A media content company has a streaming playback application. The company wants to collect and analyze the data to provide near-real-time feedback on playback issues. The company needs to consume this data and return results within 30 seconds according to the service-level agreement (SLA). The company needs the consumer to identify playback issues, such as quality during a specified timeframe. The data will be emitted as JSON and may change schemas over time.
Which solution will allow the company to collect data for processing while meeting these requirements?
- A . Send the data to Amazon Kinesis Data Firehose with delivery to Amazon S3. Configure an S3 event trigger an AWS Lambda function to process the data. The Lambda function will consume the data and process it to identify potential playback issues. Persist the raw data to Amazon S3.
- B . Send the data to Amazon Managed Streaming for Kafka and configure an Amazon Kinesis Analytics for Java application as the consumer. The application will consume the data and process it to identify potential playback issues. Persist the raw data to Amazon DynamoDB.
- C . Send the data to Amazon Kinesis Data Firehose with delivery to Amazon S3. Configure Amazon S3 to trigger an event for AWS Lambda to process. The Lambda function will consume the data and process it to identify potential playback issues. Persist the raw data to Amazon DynamoDB.
- D . Send the data to Amazon Kinesis Data Streams and configure an Amazon Kinesis Analytics for Java application as the consumer. The application will consume the data and process it to identify potential playback issues. Persist the raw data to Amazon S3.
A manufacturing company has been collecting IoT sensor data from devices on its factory floor for a year and is storing the data in Amazon Redshift for daily analysis. A data analyst has determined that, at an expected ingestion rate of about 2 TB per day, the cluster will be undersized in less than 4 months. A long-term solution is needed. The data analyst has indicated that most queries only reference the most recent 13 months of data, yet there are also quarterly reports that need to query all the data generated from the past 7 years. The chief technology officer (CTO) is concerned about the costs, administrative effort, and performance of a long-term solution.
Which solution should the data analyst use to meet these requirements?
- A . Create a daily job in AWS Glue to UNLOAD records older than 13 months to Amazon S3 and delete those records from Amazon Redshift. Create an external table in Amazon Redshift to point to the S3 location. Use Amazon Redshift Spectrum to join to data that is older than 13 months.
- B . Take a snapshot of the Amazon Redshift cluster. Restore the cluster to a new cluster using dense storage nodes with additional storage capacity.
- C . Execute a CREATE TABLE AS SELECT (CTAS) statement to move records that are older than 13 months to quarterly partitioned data in Amazon Redshift Spectrum backed by Amazon S3.
- D . Unload all the tables in Amazon Redshift to an Amazon S3 bucket using S3 Intelligent-Tiering. Use AWS Glue to crawl the S3 bucket location to create external tables in an AWS Glue Data Catalog. Create an Amazon EMR cluster using Auto Scaling for any daily analytics needs, and use Amazon Athena for the quarterly reports, with both using the same AWS Glue Data Catalog.
A company’s data analyst needs to ensure that queries executed in Amazon Athena cannot scan more than a prescribed amount of data for cost control purposes. Queries that exceed the prescribed threshold must be canceled immediately.
What should the data analyst do to achieve this?
- A . Configure Athena to invoke an AWS Lambda function that terminates queries when the prescribed threshold is crossed.
- B . For each workgroup, set the control limit for each query to the prescribed threshold.
- C . Enforce the prescribed threshold on all Amazon S3 bucket policies
- D . For each workgroup, set the workgroup-wide data usage control limit to the prescribed threshold.
A company wants to collect and process events data from different departments in near-real time. Before storing the data in Amazon S3, the company needs to clean the data by standardizing the format of the address and timestamp columns. The data varies in size based on the overall load at each particular point in time. A single data record can be 100 KB-10 MB.
How should a data analytics specialist design the solution for data ingestion?
- A . Use Amazon Kinesis Data Streams. Configure a stream for the raw data. Use a Kinesis Agent to write data to the stream. Create an Amazon Kinesis Data Analytics application that reads data from the raw stream, cleanses it, and stores the output to Amazon S3.
- B . Use Amazon Kinesis Data Firehose. Configure a Firehose delivery stream with a preprocessing AWS Lambda function for data cleansing. Use a Kinesis Agent to write data to the delivery stream. Configure Kinesis Data Firehose to deliver the data to Amazon S3.
- C . Use Amazon Managed Streaming for Apache Kafka. Configure a topic for the raw data. Use a Kafka producer to write data to the topic. Create an application on Amazon EC2 that reads data from the topic by using the Apache Kafka consumer API, cleanses the data, and writes to Amazon S3.
- D . Use Amazon Simple Queue Service (Amazon SQS). Configure an AWS Lambda function to read events from the SQS queue and upload the events to Amazon S3.