Practice Free D-DS-FN-23 Exam Online Questions
An IT department deployed a spam filter to reduce the amount of junk e-mail received by its employees. After six months, they notice that the spam filter is less effective than when initially deployed.
They examine the system running the spam filter and it appears to be operating normally.
What action would improve the effectiveness of the spam filter?
- A . Add more processing power to the spam filtering system
- B . Add more storage to the spam filtering system
- C . Create a linear regression model to calculate the probability of an email being spam
- D . Retrain the spam filter with newer examples of spam emails
What are the characteristics of Big Data?
- A . Data volume, processing complexity, and data structure variety.
- B . Data volume, business importance, and data structure variety.
- C . Data type, processing complexity, and data structure variety.
- D . Data volume, processing complexity, and business importance.
Which word or phrase completes the statement; “A data scientist would consider a RDBMS is to a table as R is to a_____.”?
- A . Data frame
- B . List
- C . Matrix
- D . Array
Under which circumstance do you need to implement N-fold cross-validation after creating a regression model?
- A . There is not enough data to create a test set.
- B . The data is unformatted.
- C . There are missing values in the data.
- D . There are categorical variables in the model.
A disk drive manufacturer has a defect rate of less than 1.5% with 98% confidence. A quality assurance team samples 1000 disk drives and finds 14 defective units.
Which action should the team recommend?
- A . The manufacturing process is functioning properly and no further action is required
- B . A larger sample size should be taken to determine if the plant is operating correctly
- C . A smaller sample size should be taken to determine if the plant is operating correctly
- D . There is a flaw in the quality assurance process and the sample should be repeated
A data scientist wants to add a new categorical variable, X2, into a Linear Regression model Y=b0+b1*X1.
How many terms should be added to the right-hand side of the equation if X2 has four possible values?
- A . 2
- B . 3
- C . 4
- D . 5
Since R factors are categorical variables, they are most closely related to which data classification level?
- A . nominal
- B . ordinal
- C . interval
- D . ratio
The R vector “v” contains 16 elements.
Which R command modifies the vector to have the same elements in reverse order?
- A . v[1,16] <- v[16:1]
- B . v <- v[16:1]
- C . v[16:1] <- v[16:1]
- D . v <- v[16, 1]
You received 100,000 home loan records and want to quickly determine if there is any correlation between mortgage age and mortgage amount before conducting advanced analysis.
Which tool should be used for the preliminary analysis?
- A . Scatter plot
- B . Stacked Bar chart
- C . Box and Whisker plot
- D . Histogram
A data scientist is given an R data frame (i.e., empdata) with the following columns: Age Salary Occupation Education Gender The scientist wants to examine only the Salary and Occupation columns for ages greater than ‘40’.
Which command extracts the appropriate rows and columns from the data frame?
- A . empdata[empdata$Age > 40, c("Salary","Occupation")]
- B . empdata[c("Salary","Occupation"), empdata$Age > 40]
- C . empdata[Age > 40, ("Salary","Occupation")]
- D . empdata[, c("Salary","Occupation")]$Age > 40