Practice Free DAS-C01 Exam Online Questions
A company is migrating from an on-premises Apache Hadoop cluster to an Amazon EMR cluster. The cluster runs only during business hours. Due to a company requirement to avoid intraday cluster failures, the EMR cluster must be highly available. When the cluster is terminated at the end of each business day, the data must persist.
Which configurations would enable the EMR cluster to meet these requirements? (Choose three.)
- A . EMR File System (EMRFS) for storage
- B . Hadoop Distributed File System (HDFS) for storage
- C . AWS Glue Data Catalog as the metastore for Apache Hive
- D . MySQL database on the master node as the metastore for Apache Hive
- E . Multiple master nodes in a single Availability Zone
- F . Multiple master nodes in multiple Availability Zones
A financial company uses Amazon Athena to query data from an Amazon S3 data lake. Files are stored in the S3 data lake in Apache ORC format. Data analysts recently introduced nested fields in the data lake ORC files, and noticed that queries are taking longer to run in Athena. A data analysts discovered that more data than what is required is being scanned for the queries.
What is the MOST operationally efficient solution to improve query performance?
- A . Flatten nested data and create separate files for each nested dataset.
- B . Use the Athena query engine V2 and push the query filter to the source ORC file.
- C . Use Apache Parquet format instead of ORC format.
- D . Recreate the data partition strategy and further narrow down the data filter criteria.
A retail company has 15 stores across 6 cities in the United States. Once a month, the sales team requests a visualization in Amazon QuickSight that provides the ability to easily identify revenue trends across cities and stores. The visualization also helps identify outliers that need to be examined with further analysis.
Which visual type in QuickSight meets the sales team’s requirements?
- A . Geospatial chart
- B . Line chart
- C . Heat map
- D . Tree map
A banking company is currently using an Amazon Redshift cluster with dense storage (DS) nodes to store sensitive data. An audit found that the cluster is unencrypted. Compliance requirements state that a database with sensitive data must be encrypted through a hardware security module (HSM) with automated key rotation.
Which combination of steps is required to achieve compliance? (Choose two.)
- A . Set up a trusted connection with HSM using a client and server certificate with automatic key rotation.
- B . Modify the cluster with an HSM encryption option and automatic key rotation.
- C . Create a new HSM-encrypted Amazon Redshift cluster and migrate the data to the new cluster.
- D . Enable HSM with key rotation through the AWS CLI.
- E . Enable Elliptic Curve Diffie-Hellman Ephemeral (ECDHE) encryption in the HSM.
A manufacturing company is storing data from its operational systems in Amazon S3. The company’s business analysts need to perform one-time queries of the data in Amazon S3 with Amazon Athen a. The company needs to access the Athena service from the on-premises network by using a JDBC connection. The company has created a VPC. Security policies mandate that requests to AWS services cannot traverse the internet.
Which combination of steps should a data analytics specialist take to meet these requirements? (Select TWO.)
- A . Establish an AWS Direct Connect connection between the on-premises network and the VPC.
- B . Configure the JDBC connection to connect to Athena through Amazon API Gateway.
- C . Configure the JDBC connection to use a gateway VPC endpoint for Amazon S3.
- D . Configure the JDBC connection to use an interface VPC endpoint for Athena.
- E . Deploy Athena within a private subnet.
A company stores its sales and marketing data that includes personally identifiable information (PII) in Amazon S3. The company allows its analysts to launch their own Amazon EMR cluster and run analytics reports with the data. To meet compliance requirements, the company must ensure the data is not publicly accessible throughout this process. A data engineer has secured Amazon S3 but must ensure the individual EMR clusters created by the analysts are not exposed to the public internet.
Which solution should the data engineer to meet this compliance requirement with LEAST amount of effort?
- A . Create an EMR security configuration and ensure the security configuration is associated with the EMR clusters when they are created.
- B . Check the security group of the EMR clusters regularly to ensure it does not allow inbound traffic from IPv4 0.0.0.0/0 or IPv6 ::/0.
- C . Enable the block public access setting for Amazon EMR at the account level before any EMR cluster is created.
- D . Use AWS WAF to block public internet access to the EMR clusters across the board.
A company is using an AWS Lambda function to run Amazon Athena queries against a cross-account
AWS Glue Data Catalog.
A query returns the following error:
HIVE METASTORE ERROR
The error message states that the response payload size exceeds the maximum allowed payload size. The queried table is already partitioned, and the data is stored in an Amazon S3 bucket in the Apache Hive partition format.
Which solution will resolve this error?
- A . Modify the Lambda function to upload the query response payload as an object into the S3 bucket.
Include an S3 object presigned URL as the payload in the Lambda function response. - B . Run the MSCK REPAIR TABLE command on the queried table.
- C . Create a separate folder in the S3 bucket. Move the data files that need to be queried into that folder. Create an AWS Glue crawler that points to the folder instead of the S3 bucket.
- D . Check the schema of the queried table for any characters that Athena does not support. Replace any unsupported characters with characters that Athena supports.
A company hosts its analytics solution on premises. The analytics solution includes a server that collects log files. The analytics solution uses an Apache Hadoop cluster to analyze the log files hourly and to produce output files. All the files are archived to another server for a specified duration.
The company is expanding globally and plans to move the analytics solution to multiple AWS Regions in the AWS Cloud. The company must adhere to the data archival and retention requirements of each country where the data is stored.
Which solution will meet these requirements?
- A . Create an Amazon S3 bucket in one Region to collect the log files. Use S3 event notifications to invoke an AWS Glue job for log analysis. Store the output files in the target S3 bucket. Use S3 Lifecycle rules on the target S3 bucket to set an expiration period that meets the retention requirements of the country that contains the Region.
- B . Create a Hadoop Distributed File System (HDFS) file system on an Amazon EMR cluster in one Region to collect the log files. Set up a bootstrap action on the EMR cluster to run an Apache Spark job. Store the output files in a target Amazon S3 bucket. Schedule a job on one of the EMR nodes to delete files that no longer need to be retained.
- C . Create an Amazon S3 bucket in each Region to collect log files. Create an Amazon EMR cluster. Submit steps on the EMR cluster for analysis. Store the output files in a target S3 bucket in each Region. Use S3 Lifecycle rules on each target S3 bucket to set an expiration period that meets the retention requirements of the country that contains the Region.
- D . Create an Amazon Kinesis Data Firehose delivery stream in each Region to collect log data. Specify an Amazon S3 bucket in each Region as the destination. Use S3 Storage Lens for data analysis. Use S3 Lifecycle rules on each destination S3 bucket to set an expiration period that meets the retention requirements of the country that contains the Region.
A manufacturing company has many loT devices in different facilities across the world The company is using Amazon Kinesis Data Streams to collect the data from the devices
The company’s operations team has started to observe many WnteThroughputExceeded exceptions The operations team determines that the reason is the number of records that are being written to certain shards The data contains device ID capture date measurement type, measurement value and facility ID The facility ID is used as the partition key
Which action will resolve this issue?
- A . Change the partition key from facility ID to a randomly generated key
- B . Increase the number of shards
- C . Archive the data on the producers’ side
- D . Change the partition key from facility ID to capture date
A software company wants to use instrumentation data to detect and resolve errors to improve application recovery time. The company requires API usage anomalies, like error rate and response time spikes, to be detected in near-real time (NRT) The company also requires that data analysts have access to dashboards for log analysis in NRT
Which solution meets these requirements?
- A . Use Amazon Kinesis Data Firehose as the data transport layer for logging data Use Amazon Kinesis Data Analytics to uncover the NRT API usage anomalies Use Kinesis Data Firehose to deliver log data to Amazon OpenSearch Service (Amazon Elasticsearch Service) for search, log analytics, and application monitoring Use OpenSearch Dashboards (Kibana) in Amazon OpenSearch Service (Amazon Elasticsearch Service) for the dashboards.
- B . Use Amazon Kinesis Data Analytics as the data transport layer for logging data. Use Amazon Kinesis Data Streams to uncover NRT monitoring metrics. Use Amazon Kinesis Data Firehose to deliver log data to Amazon OpenSearch Service (Amazon Elasticsearch Service) for search, log analytics, and application monitoring Use Amazon QuickSight for the dashboards
- C . Use Amazon Kinesis Data Analytics as the data transport layer for logging data and to uncover NRT monitoring metrics Use Amazon Kinesis Data Firehose to deliver log data to Amazon OpenSearch Service (Amazon Elasticsearch Service) for search, log analytics, and application monitoring Use OpenSearch Dashboards (Kibana) in Amazon OpenSearch Service (Amazon Elasticsearch Service) for the dashboards
- D . Use Amazon Kinesis Data Firehose as the data transport layer for logging data Use Amazon Kinesis Data Analytics to uncover NRT monitoring metrics Use Amazon Kinesis Data Streams to deliver log data to Amazon OpenSearch Service (Amazon Elasticsearch Service) for search, log analytics, and
application monitoring Use Amazon QuickSight for the dashboards.