Free Amazon AWS-DEA-C01 Exam Questions

Question 1

A company has as JSON file that contains personally identifiable information (PIT) data and non-PII data. The company needs to make the data available for querying and analysis. The non-PII data must be available to everyone in the company. The PII data must be available only to a limited group of employees. Which solution will meet these requirements with the LEAST operational overhead?

A : Store the JSON file in an Amazon S3 bucket. Configure AWS Glue to split the file into one file that contains the PII data and one file that contains the non-PII data. Store the output files in separate S3 buckets. Grant the required access to the buckets based on the type of user.

B : Store the JSON file in an Amazon S3 bucket. Use Amazon Macie to identify PII data and to grant access based on the type of user.

C : Store the JSON file in an Amazon S3 bucket. Catalog the file schema in AWS Lake Formation. Use Lake Formation permissions to provide access to the required data based on the type of user.

D : Create two Amazon RDS PostgreSQL databases. Load the PII data and the non-PII data into the separate databases. Grant access to the databases based on the type of user.

Answer: C

Question 2

A company has three subsidiaries. Each subsidiary uses a different data warehousing solution. The firstsubsidiary hosts its data warehouse in Amazon Redshift. The second subsidiary uses Teradata Vantage onAWS. The third subsidiary uses Google BigQuery.The company wants to aggregate all the data into a central Amazon S3 data lake. The company wants to useApache Iceberg as the table format.A data engineer needs to build a new pipeline to connect to all the data sources, run transformations by usingeach source engine, join the data, and write the data to Iceberg.Which solution will meet these requirements with the LEAST operational effort?

A : Use native Amazon Redshift, Teradata, and BigQuery connectors to build the pipeline in AWS Glue. Use native AWS Glue transforms to join the data. Run a Merge operation on the data lake Iceberg table.

B : Use the Amazon Athena federated query connectors for Amazon Redshift, Teradata, and BigQuery to build the pipeline in Athena. Write a SQL query to read from all the data sources, join the data, and run a Merge operation on the data lake Iceberg table

C : Use the native Amazon Redshift connector, the Java Database Connectivity (JDBC) connector for Teradata, and the open source Apache Spark BigQuery connector to build the pipeline in Amazon EMR. Write code in PySpark to join the data. Run a Merge operation on the data lake Iceberg table.

D : Use the native Amazon Redshift, Teradata, and BigQuery connectors in Amazon Appflow to write data to Amazon S3 and AWS Glue Data Catalog. Use Amazon Athena to join the data. Run a Merge operation on the data lake Iceberg table.

Answer: B

Question 3

A data engineer needs to reformat newly acquired .csv data in an S3 bucket for more efficient analytics. The data includes timestamps, and older data is removed daily. What is the most cost-effective AWS solution to transform and optimize this data for query performance?

A : Set up a daily Lambda function to convert and partition the .csv data into Parquet format within S3.

B : Use AWS Glue to convert the .csv data to Apache Parquet and partition it by timestamp.

C : Configure an EMR Spark job to transform the .csv files into Parquet, partitioned by the creation timestamp.

D : Run an Athena CTAS query to convert the data to Parquet format with Snappy compression, partitioned by timestamp.

Answer: D

Question 4

A data engineer wants to orchestrate a set of extract, transform, and load (ETL) jobs that run on AWS. TheETL jobs contain tasks that must run Apache Spark jobs on Amazon EMR, make API calls to Salesforce, andload data into Amazon Redshift.The ETL jobs need to handle failures and retries automatically. The data engineer needs to use Python toorchestrate the jobs.Which service will meet these requirements?

A : Amazon Managed Workflows for Apache Airflow (Amazon MWAA)

B : AWS Step Functions

C : AWS Glue

D : Amazon EventBridge

Answer: A

Question 5

A company is integrating a business intelligence (BI) tool with their data warehouse hosted on Microsoft SQL Server. The BI team requires regular data extracts to be transformed and stored in Amazon S3 for further analysis. The BI team needs a solution to manage this ETL process efficiently and at a low cost.

Which AWS service or feature is the most cost-effective for orchestrating an ETL pipeline that extracts data from Microsoft SQL Server, transforms it, and loads it into Amazon S3?

A : AWS Batch

B : AWS Glue DataBrew

C : AWS Data Pipeline

D : AWS Glue workflows

Answer: D