Free Google Professional-Data-Engineer Exam Questions

Become Google Certified with updated Professional-Data-Engineer exam questions and correct answers

Page: 1 / 95

Total 472 Questions | Updated On: Jun 25, 2025

Add To Cart

Question 1

MJTelco’s Google Cloud Dataflow pipeline is now ready to start receiving data from the 50,000 installations. You want to allow Cloud Dataflow to scale its compute power up as required. Which Cloud Dataflow pipeline configuration setting should you update?

A : zone

B : number of workers

C : disk size per worker

D : maximum number of workers

Answer: A

Question 2

What are the minimum permissions needed for a service account used with Google Dataproc?

A : ecute to Google Cloud Storage; write to Google Cloud Logging

B : ite to Google Cloud Storage; read to Google Cloud Logging

C : ecute to Google Cloud Storage; execute to Google Cloud Logging

D : ad and write to Google Cloud Storage; write to Google Cloud Logging

Answer: D

Question 3

You have historical data covering the last three years in BigQuery and a data pipeline that delivers new data to BigQuery daily. You have noticed that when the Data Science team runs a query filtered on a date column and limited to 30–90 days of data, the query scans the entire table. You also noticed that your bill is increasing more quickly than you expected. You want to resolve the issue as cost-effectively as possible while maintaining the ability to conduct SQL queries. What should you do?

A : -create the tables using DDL. Partition the tables by a column containing a TIMESTAMP or DATE Type.

B : commend that the Data Science team export the table to a CSV file on Cloud Storage and use Cloud Datalab to explore the data by reading the files directly.

C : dify your pipeline to maintain the last 30–90 days of data in one table and the longer history in a different table to minimize full table scans over the entire history.

D : ite an Apache Beam pipeline that creates a BigQuery table per day. Recommend that the Data Science team use wildcards on the table name suffixes to select the data they need.

Answer: C

Question 4

You receive data files in CSV format monthly from a third party. You need to cleanse this data, but every third

month the schema of the files changes. Your requirements for implementing these transformations include:

Executing the transformations on a schedule

Enabling non-developer analysts to modify transformations

Providing a graphical tool for designing transformations

What should you do?

A : Cloud Dataprep to build and maintain the transformation recipes, and execute them on a scheduled basis

B : ad each month’s CSV data into BigQuery, and write a SQL query to transform the data to a standard schema. Merge the transformed tables together with a SQL query

C : lp the analysts write a Cloud Dataflow pipeline in Python to perform the transformation. The Python code should be stored in a revision control system and modified as the incoming data’s schema changes

D : Apache Spark on Cloud Dataproc to infer the schema of the CSV file before creating a Dataframe. Then implement the transformations in Spark SQL before writing the data out to Cloud Storage and loading into BigQuery

Answer: D

Question 5

You are deploying MariaDB SQL databases on GCE VM Instances and need to configure monitoring and alerting. You want to collect metrics including network connections, disk IO and replication status from MariaDB with minimal development effort and use StackDriver for dashboards and alerts.

What should you do?

A : tall the OpenCensus Agent and create a custom metric collection application with a StackDriver exporter.

B : ace the MariaDB instances in an Instance Group with a Health Check.

C : tall the StackDriver Logging Agent and configure fluentd in_tail plugin to read MariaDB logs.

D : tall the StackDriver Agent and configure the MySQL plugin.

Answer: C

Page: 1 / 95

Total 472 Questions | Updated On: Jun 25, 2025

Add To Cart