Free Databricks Databricks-Certified-Professional-Data-Engineer Exam Questions

Become Databricks Certified with updated Databricks-Certified-Professional-Data-Engineer exam questions and correct answers

Page: 1 / 50

Total 247 Questions | Updated On: Aug 04, 2023

Add To Cart

Question 1

Which of the following is true of Delta Lake and the Lakehouse?

A : cause Parquet compresses data row by row. strings will only be compressed when a character is repeated multiple times.

B : lta Lake automatically collects statistics on the first 32 columns of each table which are leveraged in data skipping based on query filters.

C : ews in the Lakehouse maintain a valid cache of the most recent versions of source tables at all times.

D : imary and foreign key constraints can be leveraged to ensure duplicate values are never entered into a dimension table.

E : order can only be applied to numeric values stored in Delta Lake tables.

Answer: B

Question 2

A DLT pipeline includes the following streaming tables:

• raw_iot ingests raw device measurement data from a heart rate tracking device.

• bpm_stats incrementally computes user statistics based on BPM measurements from raw_iot.

How can the data engineer configure this pipeline to be able to retain manually deleted or updated records in the raw_iot table, while recomputing the downstream table bpm_stats table when a pipeline update is run?

A : Set the pipelines.reset.allowed property to false on raw_iot

B : Set the skipChangeCommits flag to true on raw_iot

C : Set the pipelines.reset.allowed property to false on bpm_stats

D : Set the skipChangeCommits flag to true on bpm_stats

Answer: B

Question 3

A data engineer is analyzing a Spark job via the Spark UI. They have the following summary metrics for 27 completed tasks in a particular stage

Which conclusion can the data engineer draw from the above statistics ?

A : All task are operating over partitions with even amounts of data

B : All task are operating over empty or near empty partitions

C : All tasks are operating over partitions with larger skewed amounts of data.

D : Number of tasks are operating over partitions with larger skewed amounts of data.

E : Number of tasks are operating over empty or near empty partitions

Answer: D

Question 4

Which of the following describes the minimal permissions a data engineer needs to start and terminate an existing cluster ?

A : “Can Attach To” privilege on the cluster

B : “Can Restart” privilege on the cluster

C : “Can Manage” privilege on the cluster

D : Cluster creation allowed + “Can Attach To” privileges on the cluster

E : Cluster creation allowed + “Can Restart” privileges on the cluster

Answer: B

Question 5

A data engineer, User A, has promoted a new pipeline to production by using the REST API to programmatically create several jobs. A DevOps engineer, User B, has configured an external orchestration tool to trigger job runs through the REST API. Both users authorized the REST API calls using their personal access tokens. Which statement describes the contents of the workspace audit logs concerning these events?

A : cause the REST API was used for job creation and triggering runs, a Service Principal will be automatically used to identify these events.

B : cause User B last configured the jobs, their identity will be associated with both the job creation events and the job run events.

C : cause these events are managed separately, User A will have their identity associated with the job creation events and User B will have their identity associated with the job run events.

D : cause the REST API was used for job creation and triggering runs, user identity will not be captured in the audit logs.

E : cause User A created the jobs, their identity will be associated with both the job creation events and the job run events.

Answer: C

Page: 1 / 50

Total 247 Questions | Updated On: Aug 04, 2023

Add To Cart