Practice Free Professional Data Engineer Exam Online Questions
You want to use a BigQuery table as a data sink. In which writing mode(s) can you use BigQuery as a sink?
- A . Both batch and streaming
- B . BigQuery cannot be used as a sink
- C . Only batch
- D . Only streaming
A
Explanation:
When you apply a BigQueryIO.Write transform in batch mode to write to a single table, Dataflow invokes a BigQuery load job. When you apply a BigQueryIO.Write transform in streaming mode or in batch mode using a function to specify the destination table, Dataflow uses BigQuery’s streaming inserts
Reference: https://cloud.google.com/dataflow/model/bigquery-io
You have historical data covering the last three years in BigQuery and a data pipeline that delivers new data to BigQuery daily. You have noticed that when the Data Science team runs a query filtered on a date column and limited to 30C90 days of data, the query scans the entire table. You also noticed that your bill is increasing more quickly than you expected. You want to resolve the issue as cost-effectively as possible while maintaining the ability to conduct SQL queries.
What should you do?
- A . Re-create the tables using DDL. Partition the tables by a column containing a TIMESTAMP or DATE Type.
- B . Recommend that the Data Science team export the table to a CSV file on Cloud Storage and use Cloud Datalab to explore the data by reading the files directly.
- C . Modify your pipeline to maintain the last 30C90 days of data in one table and the longer history in a different table to minimize full table scans over the entire history.
- D . Write an Apache Beam pipeline that creates a BigQuery table per day. Recommend that the Data Science team use wildcards on the table name suffixes to select the data they need.
Your business users need a way to clean and prepare data before using the data for analysis. Your business users are less technically savvy and prefer to work with graphical user interfaces to define their transformations. After the data has been transformed, the business users want to perform their analysis directly in a spreadsheet. You need to recommend a solution that they can use.
What should you do?
- A . Use Dataprep to clean the data, and write the results to BigQuery Analyze the data by using Connected Sheets.
- B . Use Dataprep to clean the data, and write the results to BigQuery Analyze the data by using Looker Studio.
- C . Use Dataflow to clean the data, and write the results to BigQuery. Analyze the data by using Connected Sheets.
- D . Use Dataflow to clean the data, and write the results to BigQuery. Analyze the data by using Looker Studio.
A
Explanation:
For business users who are less technically savvy and prefer graphical user interfaces, Dataprep is an ideal tool for cleaning and preparing data, as it offers a user-friendly interface for defining data transformations without the need for coding. Once the data is cleaned and prepared, writing the results to BigQuery allows for the storage and management of large datasets. Analyzing the data using Connected Sheets enables business users to work within the familiar environment of a spreadsheet, leveraging the power of BigQuery directly within Google Sheets. This solution aligns with the needs of the users and follows Google’s recommended practices for data cleaning, preparation, and analysis.
Reference: Connected Sheets | Google Sheets | Google for Developers
Professional Data Engineer Certification Exam Guide | Learn – Google Cloud
Engineer Data in Google Cloud | Google Cloud Skills Boost – Qwiklabs
You currently have a single on-premises Kafka cluster in a data center in the us-east region that is responsible for ingesting messages from IoT devices globally. Because large parts of globe have poor internet connectivity, messages sometimes batch at the edge, come in all at once, and cause a spike in load on your Kafka cluster. This is becoming difficult to manage and prohibitively expensive.
What is the Google-recommended cloud native architecture for this scenario?
- A . Edge TPUs as sensor devices for storing and transmitting the messages.
- B . Cloud Dataflow connected to the Kafka cluster to scale the processing of incoming messages.
- C . An IoT gateway connected to Cloud Pub/Sub, with Cloud Dataflow to read and process the messages from Cloud Pub/Sub.
- D . A Kafka cluster virtualized on Compute Engine in us-east with Cloud Load Balancing to connect to the devices around the world.
All Google Cloud Bigtable client requests go through a front-end server ______ they are sent to a
Cloud Bigtable node.
- A . before
- B . after
- C . only if
- D . once
A
Explanation:
In a Cloud Bigtable architecture all client requests go through a front-end server before they are sent to a Cloud Bigtable node.
The nodes are organized into a Cloud Bigtable cluster, which belongs to a Cloud Bigtable instance, which is a container for the cluster. Each node in the cluster handles a subset of the requests to the cluster.
When additional nodes are added to a cluster, you can increase the number of simultaneous requests that the cluster can handle, as well as the maximum throughput for the entire cluster.
Reference: https://cloud.google.com/bigtable/docs/overview
MJTelco is building a custom interface to share data.
They have these requirements:
They need to do aggregations over their petabyte-scale datasets.
They need to scan specific time range rows with a very fast response time (milliseconds).
Which combination of Google Cloud Platform products should you recommend?
- A . Cloud Datastore and Cloud Bigtable
- B . Cloud Bigtable and Cloud SQL
- C . BigQuery and Cloud Bigtable
- D . BigQuery and Cloud Storage
You are creating a model to predict housing prices. Due to budget constraints, you must run it on a single resource-constrained virtual machine.
Which learning algorithm should you use?
- A . Linear regression
- B . Logistic classification
- C . Recurrent neural network
- D . Feedforward neural network
You are migrating an application that tracks library books and information about each book, such as author or year published, from an on-premises data warehouse to BigQuery In your current relational database, the author information is kept in a separate table and joined to the book information on a common key.
Based on Google’s recommended practice for schema design, how would you structure the data to ensure optimal speed of queries about the author of each book that has been borrowed?
- A . Keep the schema the same, maintain the different tables for the book and each of the attributes, and query as you are doing today
- B . Create a table that is wide and includes a column for each attribute, including the author’s first name, last name, date of birth, etc
- C . Create a table that includes information about the books and authors, but nest the author fields inside the author column
- D . Keep the schema the same, create a view that joins all of the tables, and always query the view
Which of these statements about exporting data from BigQuery is false?
- A . To export more than 1 GB of data, you need to put a wildcard in the destination filename.
- B . The only supported export destination is Google Cloud Storage.
- C . Data can only be exported in JSON or Avro format.
- D . The only compression option available is GZIP.
C
Explanation:
Data can be exported in CSV, JSON, or Avro format. If you are exporting nested or repeated data, then CSV format is not supported.
Reference: https://cloud.google.com/bigquery/docs/exporting-data
You need to migrate a Redis database from an on-premises data center to a Memorystore for Redis instance. You want to follow Google-recommended practices and perform the migration for minimal cost. time, and effort.
What should you do?
- A . Make a secondary instance of the Redis database on a Compute Engine instance, and then perform a live cutover.
- B . Write a shell script to migrate the Redis data, and create a new Memorystore for Redis instance.
- C . Create a Dataflow job to road the Redis database from the on-premises data center. and write the data to a Memorystore for Redis instance
- D . Make an RDB backup of the Redis database, use the gsutil utility to copy the RDB file into a Cloud Storage bucket, and then import the RDB tile into the Memorystore for Redis instance.
D
Explanation:
The import and export feature uses the native RDB snapshot feature of Redis to import data into or export data out of a Memorystore for Redis instance. The use of the native RDB format prevents lock-in and makes it very easy to move data within Google Cloud or outside of Google Cloud. Import and export uses Cloud Storage buckets to store RDB files.
Reference: https://cloud.google.com/memorystore/docs/redis/import-export-overview