Become Amazon Certified with updated AWS-DEA-C01 exam questions and correct answers
A company needs to load customer data that comes from a third party into an Amazon Redshift datawarehouse. The company stores order data and product data in the same data warehouse. The company wantsto use the combined dataset to identify potential new customers.A data engineer notices that one of the fields in the source data includes values that are in JSON format.How should the data engineer load the JSON data into the data warehouse with the LEAST effort?
A company receives test results from testing facilities that are located around the world. The company storesthe test results in millions of 1 KB JSON files in an Amazon S3 bucket. A data engineer needs to process thefiles, convert them into Apache Parquet format, and load them into Amazon Redshift tables. The dataengineer uses AWS Glue to process the files, AWS Step Functions to orchestrate the processes, and AmazonEventBridge to schedule jobs.The company recently added more testing facilities. The time required to process files is increasing. The dataengineer must reduce the data processing time.Which solution will MOST reduce the data processing time?
A retail company receives a daily .xls file with customer data, which is uploaded to Amazon S3. The file size is about 2 GB. A data engineer is tasked with combining the customer first name and last name fields and then identifying the total number of unique customer entries in the file.
Which AWS service or feature should the data engineer use to determine the count of distinct customers with minimal operational effort?
A sales company uses AWS Glue ETL to collect, process, and ingest data into an Amazon S3 bucket. The AWS Glue pipeline creates a new file in the S3 bucket every hour. File sizes vary from 200 KB to 300 KB. The company wants to build a sales prediction model by using data from the previous 5 years. The historic data includes 44,000 files. The company builds a second AWS Glue ETL pipeline by using the smallest worker type. The second pipeline retrieves the historic files from the S3 bucket and processes the files for downstream analysis. The company notices significant performance issues with the second ETL pipeline. The company needs to improve the performance of the second pipeline. Which solution will meet this requirement MOST cost-effectively?
Your organization has deployed a series of IoT devices across its facilities to monitor environmental conditions. These devices send telemetry data every few seconds. As part of the data pipeline, you have been tasked with architecting a solution that ingests this streaming data, provides the ability to perform real-time analytics, and subsequently batches the data for storage in Amazon Redshift for further analysis. The solution should be scalable, manage large bursts of data effectively, and ensure that analytics can be performed promptly.
As a Cloud Data Engineering Consultant, which combination of AWS services would you employ to meet these requirements? (Select THREE)
© Copyrights DumpsCertify 2025. All Rights Reserved
We use cookies to ensure your best experience. So we hope you are happy to receive all cookies on the DumpsCertify.