Browse DBS Questions
Study all 100 questions at your own pace with detailed explanations
Total: 100 questionsPage: 10 of 10
Question 91 of 100
A company is building a new application in AWS. The architect needs to design a system to collect application log events. The design should be a repeatable pattern that minimizes data loss if an application instance fails, and keeps a durable copy of a log data for at least 30 days. What is the simplest architecture that will allow the architect to analyze the logs?
AWrite them directly to a Kinesis Firehose. Configure Kinesis Firehose to load the events into an Amazon Redshift cluster for analysis.
BWrite them to a file on Amazon Simple Storage Service (S3). Write an AWS Lambda function that runs in response to the S3 event to load the events into Amazon Elasticsearch Service for analysis.
CWrite them to the local disk and configure the Amazon CloudWatch Logs agent to load the data into CloudWatch Logs and subsequently into Amazon Elasticsearch Service.
DWrite them to CloudWatch Logs and use an AWS Lambda function to load them into HDFS on an Amazon Elastic MapReduce (EMR) cluster for analysis.
💡 Try to answer first, then click "Show Answer" to see the correct answer and explanation
Question 92 of 100
A large oil and gas company needs to provide near real-time alerts when peak thresholds are exceeded in its pipeline system. The company has developed a system to capture pipeline metrics such as flow rate, pressure, and temperature using millions of sensors. The sensors deliver to AWS IoT. What is a cost-effective way to provide near real-time alerts on the pipeline metrics?
ACreate an AWS IoT rule to generate an Amazon SNS notification.
BStore the data points in an Amazon DynamoDB table and poll if for peak metrics data from an Amazon EC2 application.
CCreate an Amazon Machine Learning model and invoke it with AWS Lambda.
DUse Amazon Kinesis Streams and a KCL-based application deployed on AWS Elastic Beanstalk.
💡 Try to answer first, then click "Show Answer" to see the correct answer and explanation
Question 93 of 100
You need real-time reporting on logs generated from your applications. In addition, you need anomaly detection. The processing latency needs to be one second or less. Which option would you choose if your team has no experience with Machine learning libraries and doesn't want to have to maintain any software installations yourself?
AKinesis Streams with Kinesis Analytics
BKafka
CKinesis Firehose to S3 and Athena
DSpark Streaming with SparkSQL and MLlib
💡 Try to answer first, then click "Show Answer" to see the correct answer and explanation
Question 94 of 100
A city has been collecting data on its public bicycle share program for the past three years. The 5PB dataset currently resides on Amazon S3. The data contains the following datapoints: Bicycle origination points Bicycle destination points Mileage between the points Number of bicycle slots available at the station (which is variable based on the station location) Number of slots available and taken at a given time The program has received additional funds to increase the number of bicycle stations available. All data is regularly archived to Amazon Glacier. The new bicycle stations must be located to provide the most riders access to bicycles. How should this task be performed?
AMove the data from Amazon S3 into Amazon EBS-backed volumes and use an EC2 based Hadoop cluster with spot instances to run a Spark job that performs a stochastic gradient descent optimization.
BUse the Amazon Redshift COPY command to move the data from Amazon S3 into Redshift and perform a SQL query that outputs the most popular bicycle stations.
CPersist the data on Amazon S3 and use a transient EMR cluster with spot instances to run a Spark streaming job that will move the data into Amazon Kinesis.
DKeep the data on Amazon S3 and use an Amazon EMR-based Hadoop cluster with spot instances to run a Spark job that performs a stochastic gradient descent optimization over EMRFS.
💡 Try to answer first, then click "Show Answer" to see the correct answer and explanation
Question 95 of 100
A large grocery distributor receives daily depletion reports from the field in the form of gzip archives of CSV files uploaded to Amazon S3. The files range from 500MB to 5GB. These files are processed daily by an EMR job. Recently it has been observed that the file sizes vary, and the EMR jobs take too long. The distributor needs to tune and optimize the data processing workflow with this limited information to improve the performance of the EMR job. Which recommendation should an administrator provide?
AReduce the HDFS block size to increase the number of task processors.
BUse bzip2 or Snappy rather than gzip for the archives.
CDecompress the gzip archives and store the data as CSV files.
DUse Avro rather than gzip for the archives.
💡 Try to answer first, then click "Show Answer" to see the correct answer and explanation
Question 96 of 100
An administrator is deploying Spark on Amazon EMR for two distinct use cases: machine learning algorithms and ad-hoc querying. All data will be stored in Amazon S3. Two separate clusters for each use case will be deployed. The data volumes on Amazon S3 are less than 10 GB. How should the administrator align instance types with the cluster’s purpose?
AMachine Learning on C instance types and ad-hoc queries on R instance types
BMachine Learning on R instance types and ad-hoc queries on G2 instance types
CMachine Learning on T instance types and ad-hoc queries on M instance types
DMachine Learning on D instance types and ad-hoc queries on I instance types
💡 Try to answer first, then click "Show Answer" to see the correct answer and explanation
Question 97 of 100
A customer's nightly EMR job processes a single 2-TB data file stored on Amazon Simple Storage Service (S3). The Amazon Elastic Map Reduce (EMR) job runs on two On-Demand core nodes and three On-Demand task nodes. Which of the following may help reduce the EMbR job completion time? Choose 2 answers
AUse three Spot Instances rather than three On-Demand instances for the task nodes.
BChange the input split size in the MapReduce job configuration.
CUse a bootstrap action to present the S3 bucket as a local filesystem.
DLaunch the core nodes and task nodes within an Amazon Virtual Cloud.
EAdjust the number of simultaneous mapper tasks.
FEnable termination protection for the job flow.
💡 Try to answer first, then click "Show Answer" to see the correct answer and explanation
Question 98 of 100
A travel website needs to present a graphical quantitative summary of its daily bookings to website visitors for marketing purposes. The website has millions of visitors per day, but wants to control costs by implementing the least-expensive solution for this visualization. What is the most cost-effective solution?
AGenerate a static graph with a transient EMR cluster daily, and store it an Amazon S3.
BGenerate a graph using MicroStrategy backed by a transient EMR cluster.
CImplement a Jupyter front-end provided by a continuously running EMR cluster leveraging spot instances for task nodes.
DImplement a Zeppelin application that runs on a long-running EMR cluster.
💡 Try to answer first, then click "Show Answer" to see the correct answer and explanation
Question 99 of 100
You need to perform ad-hoc business analytics queries on well-structured data. Data comes in constantly at a high velocity. Your business intelligence team can understand SQL. What AWS service(s) should you look to first?
AKinesis Firehose + RDS
BKinesis Firehose + Redshift
CEMR using Hive
DEMR running Apache Spark
💡 Try to answer first, then click "Show Answer" to see the correct answer and explanation
Question 100 of 100
A game company needs to properly scale its game application, which is backed by DynamoDB. Amazon Redshift has the past two years of historical data. Game traffic varies throughout the year based on various factors such as season, movie release, and holiday season. An administrator needs to calculate how much read and write throughput should be provisioned for DynamoDB table for each week in advance. How should the administrator accomplish this task?
AFeed the data into Amazon Machine Learning and build a regression model.
BFeed the data into Spark Mlib and build a random forest model.
CFeed the data into Apache Mahout and build a multi-classification model.
DFeed the data into Amazon Machine Learning and build a binary classification model.
💡 Try to answer first, then click "Show Answer" to see the correct answer and explanation