DAS-C01 Exam - AWS Certified Data Analytics - Specialty

certleader.com

Pass4sure offers free demo for DAS-C01 exam. "AWS Certified Data Analytics - Specialty", also known as DAS-C01 exam, is a Amazon-Web-Services Certification. This set of posts, Passing the Amazon-Web-Services DAS-C01 exam, will help you answer those questions. The DAS-C01 Questions & Answers covers all the knowledge points of the real exam. 100% real Amazon-Web-Services DAS-C01 exams and revised by experts!

Check DAS-C01 free dumps before getting the full version:

NEW QUESTION 1
A company that produces network devices has millions of users. Data is collected from the devices on an hourly basis and stored in an Amazon S3 data lake.
The company runs analyses on the last 24 hours of data flow logs for abnormality detection and to troubleshoot and resolve user issues. The company also analyzes historical logs dating back 2 years to discover patterns and look for improvement opportunities.
The data flow logs contain many metrics, such as date, timestamp, source IP, and target IP. There are about 10 billion events every day.
How should this data be stored for optimal performance?

  • A. In Apache ORC partitioned by date and sorted by source IP
  • B. In compressed .csv partitioned by date and sorted by source IP
  • C. In Apache Parquet partitioned by source IP and sorted by date
  • D. In compressed nested JSON partitioned by source IP and sorted by date

Answer: A

NEW QUESTION 2
A company wants to optimize the cost of its data and analytics platform. The company is ingesting a number of .c sv and JSON files in Amazon S3 from various data sources. Incoming data is expected to be 50 GB each day. The company is using Amazon Athena to query the raw data in Amazon S3 directly. Most queries aggregate data from the past 12 months, and data that is older than 5 years is infrequently queried. The typical query scans about 500 MB of data and is expected to return results in less than 1 minute. The raw data must be retained indefinitely for compliance requirements.
Which solution meets the company’s requirements?

  • A. Use an AWS Glue ETL job to compress, partition, and convert the data into a columnar data forma
  • B. Use Athena to query the processed datase
  • C. Configure a lifecycle policy to move the processed data into the Amazon S3 Standard-Infrequent Access (S3 Standard-IA) storage class 5 years after object creatio
  • D. Configure a second lifecycle policy to move the raw data into Amazon S3 Glacier for long-term archival 7 days after object creation.
  • E. Use an AWS Glue ETL job to partition and convert the data into a row-based data forma
  • F. Use Athena to query the processed datase
  • G. Configure a lifecycle policy to move the data into the Amazon S3 Standard- Infrequent Access (S3 Standard-IA) storage class 5 years after object creatio
  • H. Configure a second lifecycle policy to move the raw data into Amazon S3 Glacier for long-term archival 7 days after object creation.
  • I. Use an AWS Glue ETL job to compress, partition, and convert the data into a columnar data forma
  • J. Use Athena to query the processed datase
  • K. Configure a lifecycle policy to move the processed data into the Amazon S3 Standard-Infrequent Access (S3 Standard-IA) storage class 5 years after the object was last accesse
  • L. Configure a second lifecycle policy to move the raw data into Amazon S3 Glacier forlong-term archival 7 days after the last date the object was accessed.
  • M. Use an AWS Glue ETL job to partition and convert the data into a row-based data forma
  • N. Use Athena to query the processed datase
  • O. Configure a lifecycle policy to move the data into the Amazon S3 Standard- Infrequent Access (S3 Standard-IA) storage class 5 years after the object was last accesse
  • P. Configure a second lifecycle policy to move the raw data into Amazon S3 Glacier for long-term archival 7 days after the last date the object was accessed.

Answer: A

NEW QUESTION 3
A hospital uses wearable medical sensor devices to collect data from patients. The hospital is architecting a near-real-time solution that can ingest the data securely at scale. The solution should also be able to remove the patient’s protected health information (PHI) from the streaming data and store the data in durable storage.
Which solution meets these requirements with the least operational overhead?

  • A. Ingest the data using Amazon Kinesis Data Streams, which invokes an AWS Lambda function using Kinesis Client Library (KCL) to remove all PH
  • B. Write the data in Amazon S3.
  • C. Ingest the data using Amazon Kinesis Data Firehose to write the data to Amazon S3. Have Amazon S3 trigger an AWS Lambda function that parses the sensor data to remove all PHI in Amazon S3.
  • D. Ingest the data using Amazon Kinesis Data Streams to write the data to Amazon S3. Have the data stream launch an AWS Lambda function that parses the sensor data and removes all PHI in Amazon S3.
  • E. Ingest the data using Amazon Kinesis Data Firehose to write the data to Amazon S3. Implement a transformation AWS Lambda function that parses the sensor data to remove all PHI.

Answer: D

Explanation:
https://aws.amazon.com/blogs/big-data/persist-streaming-data-to-amazon-s3-using-amazon-kinesis-firehose-and

NEW QUESTION 4
A company has an application that uses the Amazon Kinesis Client Library (KCL) to read records from a Kinesis data stream.
After a successful marketing campaign, the application experienced a significant increase in usage. As a result, a data analyst had to split some shards in the data stream. When the shards were split, the application started throwing an ExpiredIteratorExceptions error sporadically.
What should the data analyst do to resolve this?

  • A. Increase the number of threads that process the stream records.
  • B. Increase the provisioned read capacity units assigned to the stream’s Amazon DynamoDB table.
  • C. Increase the provisioned write capacity units assigned to the stream’s Amazon DynamoDB table.
  • D. Decrease the provisioned write capacity units assigned to the stream’s Amazon DynamoDB table.

Answer: C

NEW QUESTION 5
A company that monitors weather conditions from remote construction sites is setting up a solution to collect temperature data from the following two weather stations.
DAS-C01 dumps exhibit Station A, which has 10 sensors
DAS-C01 dumps exhibit Station B, which has five sensors
These weather stations were placed by onsite subject-matter experts.
Each sensor has a unique ID. The data collected from each sensor will be collected using Amazon Kinesis Data Streams.
Based on the total incoming and outgoing data throughput, a single Amazon Kinesis data stream with two shards is created. Two partition keys are created based on the station names. During testing, there is a bottleneck on data coming from Station A, but not from Station B. Upon review, it is confirmed that the total stream throughput is still less than the allocated Kinesis Data Streams throughput.
How can this bottleneck be resolved without increasing the overall cost and complexity of the solution, while retaining the data collection quality requirements?

  • A. Increase the number of shards in Kinesis Data Streams to increase the level of parallelism.
  • B. Create a separate Kinesis data stream for Station A with two shards, and stream Station A sensor data to the new stream.
  • C. Modify the partition key to use the sensor ID instead of the station name.
  • D. Reduce the number of sensors in Station A from 10 to 5 sensors.

Answer: C

Explanation:
https://docs.aws.amazon.com/streams/latest/dev/kinesis-using-sdk-java-resharding.html
"Splitting increases the number of shards in your stream and therefore increases the data capacity of the stream. Because you are charged on a per-shard basis, splitting increases the cost of your stream"

NEW QUESTION 6
A large retailer has successfully migrated to an Amazon S3 data lake architecture. The company’s marketing team is using Amazon Redshift and Amazon QuickSight to analyze data, and derive and visualize insights. To ensure the marketing team has the most up-to-date actionable information, a data analyst implements nightly refreshes of Amazon Redshift using terabytes of updates from the previous day.
After the first nightly refresh, users report that half of the most popular dashboards that had been running correctly before the refresh are now running much slower. Amazon CloudWatch does not show any alerts.
What is the MOST likely cause for the performance degradation?

  • A. The dashboards are suffering from inefficient SQL queries.
  • B. The cluster is undersized for the queries being run by the dashboards.
  • C. The nightly data refreshes are causing a lingering transaction that cannot be automatically closed by Amazon Redshift due to ongoing user workloads.
  • D. The nightly data refreshes left the dashboard tables in need of a vacuum operation that could not be automatically performed by Amazon Redshift due to ongoing user workloads.

Answer: D

Explanation:
https://github.com/awsdocs/amazon-redshift-developer-guide/issues/21

NEW QUESTION 7
A company has a marketing department and a finance department. The departments are storing data in Amazon S3 in their own AWS accounts in AWS Organizations. Both departments use AWS Lake Formation to catalog and secure their data. The departments have some databases and tables that share common names.
The marketing department needs to securely access some tables from the finance department. Which two steps are required for this process? (Choose two.)

  • A. The finance department grants Lake Formation permissions for the tables to the external account for the marketing department.
  • B. The finance department creates cross-account IAM permissions to the table for the marketing department role.
  • C. The marketing department creates an IAM role that has permissions to the Lake Formation tables.

Answer: AB

Explanation:
Granting Lake Formation Permissions Creating an IAM role (AWS CLI)

NEW QUESTION 8
A transportation company uses IoT sensors attached to trucks to collect vehicle data for its global delivery fleet. The company currently sends the sensor data in small .csv files to Amazon S3. The files are then loaded into a 10-node Amazon Redshift cluster with two slices per node and queried using both Amazon Athena and Amazon Redshift. The company wants to optimize the files to reduce the cost of querying and also improve the speed of data loading into the Amazon Redshift cluster.
Which solution meets these requirements?

  • A. Use AWS Glue to convert all the files from .csv to a single large Apache Parquet fil
  • B. COPY the file into Amazon Redshift and query the file with Athena from Amazon S3.
  • C. Use Amazon EMR to convert each .csv file to Apache Avr
  • D. COPY the files into Amazon Redshift and query the file with Athena from Amazon S3.
  • E. Use AWS Glue to convert the files from .csv to a single large Apache ORC fil
  • F. COPY the file into Amazon Redshift and query the file with Athena from Amazon S3.
  • G. Use AWS Glue to convert the files from .csv to Apache Parquet to create 20 Parquet file
  • H. COPY the files into Amazon Redshift and query the files with Athena from Amazon S3.

Answer: D

NEW QUESTION 9
A company analyzes its data in an Amazon Redshift data warehouse, which currently has a cluster of three dense storage nodes. Due to a recent business acquisition, the company needs to load an additional 4 TB of user data into Amazon Redshift. The engineering team will combine all the user data and apply complex calculations that require I/O intensive resources. The company needs to adjust the cluster's capacity to support the change in analytical and storage requirements.
Which solution meets these requirements?

  • A. Resize the cluster using elastic resize with dense compute nodes.
  • B. Resize the cluster using classic resize with dense compute nodes.
  • C. Resize the cluster using elastic resize with dense storage nodes.
  • D. Resize the cluster using classic resize with dense storage nodes.

Answer: C

NEW QUESTION 10
A data analytics specialist is building an automated ETL ingestion pipeline using AWS Glue to ingest compressed files that have been uploaded to an Amazon S3 bucket. The ingestion pipeline should support incremental data processing.
Which AWS Glue feature should the data analytics specialist use to meet this requirement?

  • A. Workflows
  • B. Triggers
  • C. Job bookmarks
  • D. Classifiers

Answer: C

NEW QUESTION 11
A retail company has 15 stores across 6 cities in the United States. Once a month, the sales team requests a visualization in Amazon QuickSight that provides the ability to easily identify revenue trends across cities and stores. The visualization also helps identify outliers that need to be examined with further analysis.
Which visual type in QuickSight meets the sales team's requirements?

  • A. Geospatial chart
  • B. Line chart
  • C. Heat map
  • D. Tree map

Answer: A

NEW QUESTION 12
A company leverages Amazon Athena for ad-hoc queries against data stored in Amazon S3. The company wants to implement additional controls to separate query execution and query history among users, teams, or applications running in the same AWS account to comply with internal security policies.
Which solution meets these requirements?

  • A. Create an S3 bucket for each given use case, create an S3 bucket policy that grants permissions to appropriate individual IAM user
  • B. and apply the S3 bucket policy to the S3 bucket.
  • C. Create an Athena workgroup for each given use case, apply tags to the workgroup, and create an IAM policy using the tags to apply appropriate permissions to the workgroup.
  • D. Create an IAM role for each given use case, assign appropriate permissions to the role for the given use case, and add the role to associate the role with Athena.
  • E. Create an AWS Glue Data Catalog resource policy for each given use case that grants permissions to appropriate individual IAM users, and apply the resource policy to the specific tables used by Athena.

Answer: B

Explanation:
https://docs.aws.amazon.com/athena/latest/ug/user-created-workgroups.html
Amazon Athena Workgroups - A new resource type that can be used to separate query execution and query history between Users, Teams, or Applications running under the same AWS account https://aws.amazon.com/about-aws/whats-new/2019/02/athena_workgroups/

NEW QUESTION 13
A financial company uses Apache Hive on Amazon EMR for ad-hoc queries. Users are complaining of sluggish performance.
A data analyst notes the following:
DAS-C01 dumps exhibit Approximately 90% of queries are submitted 1 hour after the market opens.
DAS-C01 dumps exhibit Hadoop Distributed File System (HDFS) utilization never exceeds 10%.
Which solution would help address the performance issues?

  • A. Create instance fleet configurations for core and task node
  • B. Create an automatic scaling policy to scale out the instance groups based on the Amazon CloudWatch CapacityRemainingGB metri
  • C. Create an automatic scaling policy to scale in the instance fleet based on the CloudWatch CapacityRemainingGB metric.
  • D. Create instance fleet configurations for core and task node
  • E. Create an automatic scaling policy to scale out the instance groups based on the Amazon CloudWatch YARNMemoryAvailablePercentage metri
  • F. Create an automatic scaling policy to scale in the instance fleet based on the CloudWatch YARNMemoryAvailablePercentage metric.
  • G. Create instance group configurations for core and task node
  • H. Create an automatic scaling policy to scale out the instance groups based on the Amazon CloudWatch CapacityRemainingGB metri
  • I. Create anautomatic scaling policy to scale in the instance groups based on the CloudWatch CapacityRemainingGB metric.
  • J. Create instance group configurations for core and task node
  • K. Create an automatic scaling policy to scale out the instance groups based on the Amazon CloudWatch YARNMemoryAvailablePercentage metri
  • L. Create an automatic scaling policy to scale in the instance groups based on the CloudWatch YARNMemoryAvailablePercentage metric.

Answer: D

Explanation:
https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-instances-guidelines.html

NEW QUESTION 14
An online retail company with millions of users around the globe wants to improve its ecommerce analytics capabilities. Currently, clickstream data is uploaded directly to Amazon S3 as compressed files. Several times each day, an application running on Amazon EC2 processes the data and makes search options and reports available for visualization by editors and marketers. The company wants to make website clicks and aggregated data available to editors and marketers in minutes to enable them to connect with users more effectively.
Which options will help meet these requirements in the MOST efficient way? (Choose two.)

  • A. Use Amazon Kinesis Data Firehose to upload compressed and batched clickstream records to Amazon Elasticsearch Service.
  • B. Upload clickstream records to Amazon S3 as compressed file
  • C. Then use AWS Lambda to send data to Amazon Elasticsearch Service from Amazon S3.
  • D. Use Amazon Elasticsearch Service deployed on Amazon EC2 to aggregate, filter, and process the data.Refresh content performance dashboards in near-real time.
  • E. Use Kibana to aggregate, filter, and visualize the data stored in Amazon Elasticsearch Servic
  • F. Refresh content performance dashboards in near-real time.
  • G. Upload clickstream records from Amazon S3 to Amazon Kinesis Data Streams and use a Kinesis Data Streams consumer to send records to Amazon Elasticsearch Service.

Answer: AD

NEW QUESTION 15
An insurance company has raw data in JSON format that is sent without a predefined schedule through an Amazon Kinesis Data Firehose delivery stream to an Amazon S3 bucket. An AWS Glue crawler is scheduled to run every 8 hours to update the schema in the data catalog of the tables stored in the S3 bucket. Data analysts analyze the data using Apache Spark SQL on Amazon EMR set up with AWS Glue Data Catalog as the metastore. Data analysts say that, occasionally, the data they receive is stale. A data engineer needs to provide access to the most up-to-date data.
Which solution meets these requirements?

  • A. Create an external schema based on the AWS Glue Data Catalog on the existing Amazon Redshift cluster to query new data in Amazon S3 with Amazon Redshift Spectrum.
  • B. Use Amazon CloudWatch Events with the rate (1 hour) expression to execute the AWS Glue crawler every hour.
  • C. Using the AWS CLI, modify the execution schedule of the AWS Glue crawler from 8 hours to 1 minute.
  • D. Run the AWS Glue crawler from an AWS Lambda function triggered by an S3:ObjectCreated:* event notification on the S3 bucket.

Answer: D

Explanation:
https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html "you can use a wildcard (for example, s3:ObjectCreated:*) to request notification when an object is created regardless of the API used" "AWS Lambda can run custom code in response to Amazon S3 bucket events. You upload your custom code to AWS Lambda and create what is called a Lambda function. When Amazon S3 detects an event of a specific type (for example, an object created event), it can publish the event to AWS Lambda and invoke your function in Lambda. In response, AWS Lambda runs your function."

NEW QUESTION 16
An airline has been collecting metrics on flight activities for analytics. A recently completed proof of concept demonstrates how the company provides insights to data analysts to improve on-time departures. The proof of concept used objects in Amazon S3, which contained the metrics in .csv format, and used Amazon Athena for querying the data. As the amount of data increases, the data analyst wants to optimize the storage solution to improve query performance.
Which options should the data analyst use to improve performance as the data lake grows? (Choose three.)

  • A. Add a randomized string to the beginning of the keys in S3 to get more throughput across partitions.
  • B. Use an S3 bucket in the same account as Athena.
  • C. Compress the objects to reduce the data transfer I/O.
  • D. Use an S3 bucket in the same Region as Athena.
  • E. Preprocess the .csv data to JSON to reduce I/O by fetching only the document keys needed by the query.
  • F. Preprocess the .csv data to Apache Parquet to reduce I/O by fetching only the data blocks needed for predicates.

Answer: CDF

Explanation:
https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/

NEW QUESTION 17
A company is hosting an enterprise reporting solution with Amazon Redshift. The application provides reporting capabilities to three main groups: an executive group to access financial reports, a data analyst group to run long-running ad-hoc queries, and a data engineering group to run stored procedures and ETL processes. The executive team requires queries to run with optimal performance. The data engineering team expects queries to take minutes.
Which Amazon Redshift feature meets the requirements for this task?

  • A. Concurrency scaling
  • B. Short query acceleration (SQA)
  • C. Workload management (WLM)
  • D. Materialized views

Answer: D

Explanation:

Materialized views:

NEW QUESTION 18
An online retailer is rebuilding its inventory management system and inventory reordering system to automatically reorder products by using Amazon Kinesis Data Streams. The inventory management system uses the Kinesis Producer Library (KPL) to publish data to a stream. The inventory reordering system uses the Kinesis Client Library (KCL) to consume data from the stream. The stream has been configured to scale as needed. Just before production deployment, the retailer discovers that the inventory reordering system is receiving duplicated data.
Which factors could be causing the duplicated data? (Choose two.)

  • A. The producer has a network-related timeout.
  • B. The stream’s value for the IteratorAgeMilliseconds metric is too high.
  • C. There was a change in the number of shards, record processors, or both.
  • D. The AggregationEnabled configuration property was set to true.
  • E. The max_records configuration property was set to a number that is too high.

Answer: BD

NEW QUESTION 19
A company’s marketing team has asked for help in identifying a high performing long-term storage service for their data based on the following requirements:
DAS-C01 dumps exhibit The data size is approximately 32 TB uncompressed.
DAS-C01 dumps exhibit There is a low volume of single-row inserts each day.
DAS-C01 dumps exhibit There is a high volume of aggregation queries each day.
DAS-C01 dumps exhibit Multiple complex joins are performed.
DAS-C01 dumps exhibit The queries typically involve a small subset of the columns in a table. Which storage service will provide the MOST performant solution?

  • A. Amazon Aurora MySQL
  • B. Amazon Redshift
  • C. Amazon Neptune
  • D. Amazon Elasticsearch

Answer: B

NEW QUESTION 20
A company has a data lake on AWS that ingests sources of data from multiple business units and uses Amazon Athena for queries. The storage layer is Amazon S3 using the AWS Glue Data Catalog. The company wants to make the data available to its data scientists and business analysts. However, the company first needs to manage data access for Athena based on user roles and responsibilities.
What should the company do to apply these access controls with the LEAST operational overhead?

  • A. Define security policy-based rules for the users and applications by role in AWS Lake Formation.
  • B. Define security policy-based rules for the users and applications by role in AWS Identity and Access Management (IAM).
  • C. Define security policy-based rules for the tables and columns by role in AWS Glue.
  • D. Define security policy-based rules for the tables and columns by role in AWS Identity and Access Management (IAM).

Answer: D

NEW QUESTION 21
......

P.S. Easily pass DAS-C01 Exam with 130 Q&As Thedumpscentre.com Dumps & pdf Version, Welcome to Download the Newest Thedumpscentre.com DAS-C01 Dumps: https://www.thedumpscentre.com/DAS-C01-dumps/ (130 New Questions)