Google Certified Professional Data Engineer - Professional-Data-Engineer Exam Practice Test

Your company is in the process of migrating its on-premises data warehousing solutions to BigQuery. The existing data warehouse uses trigger-based change data capture (CDC) to apply updates from multiple transactional database sources on a daily basis. With BigQuery, your company hopes to improve its handling of CDC so that changes to the source systems are available to query in BigQuery in near-real time using log-based CDC streams, while also optimizing for the performance of applying changes to the data warehouse. Which two steps should they take to ensure that changes are available in the BigQuery reporting table with minimal latency while reducing compute overhead? (Choose two.)
Correct Answer: A,D Vote an answer
Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).
You are creating a data model in BigQuery that will hold retail transaction data. Your two largest tables, sales_transaction_header and sales_transaction_line, have a tightly coupled immutable relationship. These tables are rarely modified after load and are frequently joined when queried.
You need to model the sales_transaction_header and sales_transaction_line tables to improve the performance of data analytics queries. What should you do?
Correct Answer: C Vote an answer
What is the recommended action to do in order to switch between SSD and HDD storage for your Google Cloud Bigtable instance?
Correct Answer: A Vote an answer
Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).
Cloud Dataproc is a managed Apache Hadoop and Apache _____ service.
Correct Answer: D Vote an answer
Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).
The Development and External teams have the project viewer Identity and Access Management (IAM) role in a folder named Visualization. You want the Development Team to be able to read data from both Cloud Storage and BigQuery, but the External Team should only be able to read data from BigQuery. What should you do?
Correct Answer: A Vote an answer
Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).
You are architecting a data transformation solution for BigQuery. Your developers are proficient with SQL and want to use the ELT development technique. In addition, your developers need an intuitive coding environment and the ability to manage SQL as code. You need to identify a solution for your developers to build these pipelines. What should you do?
Correct Answer: B Vote an answer
Your company stores vital operational sales data in a BigQuery dataset in us-central1. Your company requires a disaster recovery plan to restore this data to us-east1 with a recovery point objective (RPO) of 24 hours and a recovery time objective (RTO) of 4 hours if us-central1 experiences an outage. You need to implement the disaster recovery plan while keeping costs and complexity to a minimum. What should you do?
Correct Answer: B Vote an answer
Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).
You create an important report for your large team in Google Data Studio 360. The report uses Google BigQuery as its data source. You notice that visualizations are not showing data that is less than 1 hour old. What should you do?
Correct Answer: A Vote an answer
Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).
You are using BigQuery with a multi-region dataset that includes a table with the daily sales volumes. This table is updated multiple times per day. You need to protect your sales table in case of regional failures with a recovery point objective (RPO) of less than 24 hours, while keeping costs to a minimum. What should you do?
Correct Answer: B Vote an answer
Which of the following statements about the Wide & Deep Learning model are true? (Select 2 answers.)
Correct Answer: A,D Vote an answer
Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).
You work for a global shipping company. You want to train a model on 40 TB of data to predict which ships in each geographic region are likely to cause delivery delays on any given day. The model will be based on multiple attributes collected from multiple sources. Telemetry data, including location in GeoJSON format, will be pulled from each ship and loaded every hour. You want to have a dashboard that shows how many and which ships are likely to cause delays within a region. You want to use a storage solution that has native functionality for prediction and geospatial processing. Which storage solution should you use?
Correct Answer: D Vote an answer