Professional-Data-Engineer by Google Actual Free Exam Q&As

Question 1

Your company is in the process of migrating its on-premises data warehousing solutions to BigQuery. The existing data warehouse uses trigger-based change data capture (CDC) to apply updates from multiple transactional database sources on a daily basis. With BigQuery, your company hopes to improve its handling of CDC so that changes to the source systems are available to query in BigQuery in near-real time using log-based CDC streams, while also optimizing for the performance of applying changes to the data warehouse. Which two steps should they take to ensure that changes are available in the BigQuery reporting table with minimal latency while reducing compute overhead? (Choose two.)

A. Insert each new CDC record and corresponding operation type to a staging table in real time. B. Periodically DELETE outdated records from the reporting table. C. Perform a DML INSERT, UPDATE, or DELETE to replicate each individual CDC record in real time directly on the reporting table. D. Periodically use a DML MERGE to perform several DML INSERT, UPDATE, and DELETE operations at the same time on the reporting table. E. Insert each new CDC record and corresponding operation type in real time to the reporting table, and use a materialized view to expose only the newest version of each unique record.

Discussion 0

Correct Answer: A,D Vote an answer

Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).

Question 2

You are creating a data model in BigQuery that will hold retail transaction data. Your two largest tables, sales_transaction_header and sales_transaction_line, have a tightly coupled immutable relationship. These tables are rarely modified after load and are frequently joined when queried.
You need to model the sales_transaction_header and sales_transaction_line tables to improve the performance of data analytics queries. What should you do?

A. Create a sales_transaction table that holds the sales_transaction_header and sales_transaction_line information as rows, duplicating the sales_transaction_header data for each line. B. Create a sales_transaction table that stores the sales_transaction_header and sales_transaction_line data as a JSON data type. C. Create a sales_transaction table that holds the sales_transaction_header information as rows and the sales_transaction_line rows as nested and repeated fields. D. Create separate sales_transaction_header and sales_transaction_line tables and, when querying, specify the sales_transaction_line first in the WHERE clause.

Discussion 0

Correct Answer: C Vote an answer

Question 3

What is the recommended action to do in order to switch between SSD and HDD storage for your Google Cloud Bigtable instance?

A. export the data from the existing instance and import the data into a new instance B. run parallel instances where one is HDD and the other is SDD C. the selection is final and you must resume using the same storage type D. create a third instance and sync the data from the two storage types via batch jobs

Discussion 0

Correct Answer: A Vote an answer

Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).

Question 4

Cloud Dataproc is a managed Apache Hadoop and Apache _____ service.

A. Blaze B. Ignite C. Fire D. Spark

Discussion 0

Correct Answer: D Vote an answer

Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).

Question 5

The Development and External teams have the project viewer Identity and Access Management (IAM) role in a folder named Visualization. You want the Development Team to be able to read data from both Cloud Storage and BigQuery, but the External Team should only be able to read data from BigQuery. What should you do?

A. Create a VPC Service Controls perimeter containing both protects and Cloud Storage as a restricted API. Add the Development Team users to the perimeter's Access Level B. Remove Cloud Storage IAM permissions to the External Team on the acme-raw-data project C. Create a VPC Service Controls perimeter containing both protects and BigQuery as a restricted API Add the External Team users to the perimeter s Access Level D. Create Virtual Private Cloud (VPC) firewall rules on the acme-raw-data protect that deny all Ingress traffic from the External Team CIDR range

Discussion 0

Correct Answer: A Vote an answer

Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).

Question 6

You are architecting a data transformation solution for BigQuery. Your developers are proficient with SQL and want to use the ELT development technique. In addition, your developers need an intuitive coding environment and the ability to manage SQL as code. You need to identify a solution for your developers to build these pipelines. What should you do?

A. Use Data Fusion to build and execute ETL pipelines. B. Use Dataform to build, manage, and schedule SQL pipelines. C. Use Cloud Composer to load data and run SQL pipelines by using the BigQuery job operators. D. Use Dataflow jobs to read data from Pub/Sub, transform the data, and load the data to BigQuery.

Discussion 0

Correct Answer: B Vote an answer

Question 7

Your company stores vital operational sales data in a BigQuery dataset in us-central1. Your company requires a disaster recovery plan to restore this data to us-east1 with a recovery point objective (RPO) of 24 hours and a recovery time objective (RTO) of 4 hours if us-central1 experiences an outage. You need to implement the disaster recovery plan while keeping costs and complexity to a minimum. What should you do?

A. Take daily BigQuery table snapshots in us-central1. B. Manually export data to a CSV file in a multi-regional Cloud Storage bucket daily and use bq load to restore to us-east1. C. Configure BigQuery cross-region dataset replication from ns-central1 to us-east1. D. Set up continuous queries and Pub/Sub to stream data changes from BigQuery tables in us- central1 to us-east1.

Discussion 0

Correct Answer: B Vote an answer

Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).

Question 8

You create an important report for your large team in Google Data Studio 360. The report uses Google BigQuery as its data source. You notice that visualizations are not showing data that is less than 1 hour old. What should you do?

A. Disable caching by editing the report settings. B. Refresh your browser tab showing the visualizations. C. Clear your browser history for the past hour then reload the tab showing the virtualizations. D. Disable caching in BigQuery by editing table details.

Discussion 0

Correct Answer: A Vote an answer

Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).

Question 9

You are using BigQuery with a multi-region dataset that includes a table with the daily sales volumes. This table is updated multiple times per day. You need to protect your sales table in case of regional failures with a recovery point objective (RPO) of less than 24 hours, while keeping costs to a minimum. What should you do?

A. Schedule a daily copy of the dataset to a backup region. B. Schedule a daily export of the table to a Cloud Storage dual or multi-region bucket. C. Modify ETL job to load the data into both the current and another backup region. D. Schedule a daily BigQuery snapshot of the table.

Discussion 0

Correct Answer: B Vote an answer

Question 10

Which of the following statements about the Wide & Deep Learning model are true? (Select 2 answers.)

A. A good use for the wide and deep model is a recommender system. B. A good use for the wide and deep model is a small-scale linear regression problem. C. The wide model is used for generalization, while the deep model is used for memorization. D. The wide model is used for memorization, while the deep model is used for generalization.

Discussion 0

Correct Answer: A,D Vote an answer

Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).

Question 11

You work for a global shipping company. You want to train a model on 40 TB of data to predict which ships in each geographic region are likely to cause delivery delays on any given day. The model will be based on multiple attributes collected from multiple sources. Telemetry data, including location in GeoJSON format, will be pulled from each ship and loaded every hour. You want to have a dashboard that shows how many and which ships are likely to cause delays within a region. You want to use a storage solution that has native functionality for prediction and geospatial processing. Which storage solution should you use?

A. Cloud Datastore B. Cloud Bigtable C. Cloud SQL for PostgreSQL D. BigQuery

Discussion 0

Correct Answer: D Vote an answer

Google Certified Professional Data Engineer - Professional-Data-Engineer Exam Practice Test