Become Amazon Certified with updated Data-Engineer-Associate exam questions and correct answers
A company has three subsidiaries. Each subsidiary uses a different data warehousing solution. The firstsubsidiary hosts its data warehouse in Amazon Redshift. The second subsidiary uses Teradata Vantage onAWS. The third subsidiary uses Google BigQuery.The company wants to aggregate all the data into a central Amazon S3 data lake. The company wants to useApache Iceberg as the table format.A data engineer needs to build a new pipeline to connect to all the data sources, run transformations by usingeach source engine, join the data, and write the data to Iceberg.Which solution will meet these requirements with the LEAST operational effort?
A company uses Amazon S3 as a data lake. The company sets up a data warehouse by using a multi-nodeAmazon Redshift cluster. The company organizes the data files in the data lake based on the data source ofeach data file.The company loads all the data files into one table in the Redshift cluster by using a separate COPY commandfor each data file location. This approach takes a long time to load all the data files into the table. Thecompany must increase the speed of the data ingestion. The company does not want to increase the cost of theprocess.Which solution will meet these requirements?
A company has as JSON file that contains personally identifiable information (PIT) data and non-PII data. The company needs to make the data available for querying and analysis. The non-PII data must be available to everyone in the company. The PII data must be available only to a limited group of employees. Which solution will meet these requirements with the LEAST operational overhead?
A data engineer has two datasets that contain sales information for multiple cities and states. One dataset is named reference, and the other dataset is named primary. The data engineer needs a solution to determine whether a specific set of values in the city and state columns of the primary dataset exactly match the same specific values in the reference dataset. The data engineer wants to use Data Quality Definition Language (DQDL) rules in an AWS Glue Data Quality job. Which rule will meet these requirements?
A large e-commerce company is looking to improve the search and recommendation capabilities on its platform. The company's data engineering team has recently built a data lake on Amazon S3, consisting of user interaction logs, product catalog information, and transactional data. The data is ingested from various sources, including RDBMS exports, streamed clickstream data, and batch processed log files, resulting in diverse data formats such as JSON, CSV, and Parquet.
The team wants to leverage this data for advanced analytics and machine learning but is facing challenges in consistently cataloging and querying this data efficiently. Additionally, they need to manage frequently evolving schemas as new product attributes and user interaction types are introduced.
How should the team use AWS Glue to address these challenges?
© Copyrights DumpsCertify 2025. All Rights Reserved
We use cookies to ensure your best experience. So we hope you are happy to receive all cookies on the DumpsCertify.