Become Amazon Certified with updated AWS-DEA-C01 exam questions and correct answers
As a Data Engineer, you are tasked with setting up a batch data ingestion pipeline for a substantial financial dataset. This data is currently housed in an Amazon RDS database, stemming from various financial services and trading platforms. The dataset is complex and heterogeneous, predominantly in JSON format, and is updated nightly, typically ranging between 100-200 GB.
Your primary objective is to efficiently transfer this data from the RDS database to a data lake in AWS, ensuring reliable storage and preparing the dataset for analytical processing with minimal yet capable transformation steps due to its complexity.
Which combination of AWS services would be most appropriate?
A company stores its processed data in an S3 bucket. The company has a strict data access policy. Thecompany uses IAM roles to grant teams within the company different levels of access to the S3 bucket.The company wants to receive notifications when a user violates the data access policy. Each notificationmust include the username of the user who violated the policy.Which solution will meet these requirements?
A sales company uses AWS Glue ETL to collect, process, and ingest data into an Amazon S3 bucket. The AWS Glue pipeline creates a new file in the S3 bucket every hour. File sizes vary from 200 KB to 300 KB. The company wants to build a sales prediction model by using data from the previous 5 years. The historic data includes 44,000 files. The company builds a second AWS Glue ETL pipeline by using the smallest worker type. The second pipeline retrieves the historic files from the S3 bucket and processes the files for downstream analysis. The company notices significant performance issues with the second ETL pipeline. The company needs to improve the performance of the second pipeline. Which solution will meet this requirement MOST cost-effectively?
A company has as JSON file that contains personally identifiable information (PIT) data and non-PII data. The company needs to make the data available for querying and analysis. The non-PII data must be available to everyone in the company. The PII data must be available only to a limited group of employees. Which solution will meet these requirements with the LEAST operational overhead?
A data engineer wants to orchestrate a set of extract, transform, and load (ETL) jobs that run on AWS. TheETL jobs contain tasks that must run Apache Spark jobs on Amazon EMR, make API calls to Salesforce, andload data into Amazon Redshift.The ETL jobs need to handle failures and retries automatically. The data engineer needs to use Python toorchestrate the jobs.Which service will meet these requirements?
© Copyrights DumpsCertify 2025. All Rights Reserved
We use cookies to ensure your best experience. So we hope you are happy to receive all cookies on the DumpsCertify.