Associate-Developer-Apache-Spark by Databricks Actual Free Exam Q&As

Question 1

Which of the following code blocks shuffles DataFrame transactionsDf, which has 8 partitions, so that it has
10 partitions?

A. transactionsDf.coalesce(transactionsDf.getNumPartitions()+2) B. transactionsDf.repartition(transactionsDf._partitions+2) C. transactionsDf.coalesce(10) D. transactionsDf.repartition(transactionsDf.rdd.getNumPartitions()+2) E. transactionsDf.repartition(transactionsDf.getNumPartitions()+2)

Discussion 0

Correct Answer: D Vote an answer

Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).

Question 2

The code block shown below should return a one-column DataFrame where the column storeId is converted to string type. Choose the answer that correctly fills the blanks in the code block to accomplish this.
transactionsDf.__1__(__2__.__3__(__4__))

A. 1. select
2. col("storeId")
3. cast
4. StringType B. 1. select
2. col("storeId")
3. as
4. StringType C. 1. select
2. col("storeId")
3. cast
4. StringType() D. 1. cast
2. "storeId"
3. as
4. StringType() E. 1. select
2. storeId
3. cast
4. StringType()

Discussion 0

Correct Answer: C Vote an answer

Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).

Question 3

The code block displayed below contains an error. The code block should return a new DataFrame that only contains rows from DataFrame transactionsDf in which the value in column predError is at least 5. Find the error.
Code block:
transactionsDf.where("col(predError) >= 5")

A. The expression returns the original DataFrame transactionsDf and not a new DataFrame. To avoid this, the code block should be transactionsDf.toNewDataFrame().where("col(predError) >= 5"). B. The argument to the where method cannot be a string. C. The argument to the where method should be "predError >= 5". D. Instead of where(), filter() should be used. E. Instead of >=, the SQL operator GEQ should be used.

Discussion 0

Correct Answer: C Vote an answer

Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).

Question 4

The code block displayed below contains an error. The code block is intended to return all columns of DataFrame transactionsDf except for columns predError, productId, and value. Find the error.
Excerpt of DataFrame transactionsDf:
transactionsDf.select(~col("predError"), ~col("productId"), ~col("value"))

A. The select operator should be replaced with the deselect operator. B. The select operator should be replaced by the drop operator. C. The column names in the select operator should not be strings and wrapped in the col operator, so they should be expressed like select(~col(predError), ~col(productId), ~col(value)). D. The select operator should be replaced by the drop operator and the arguments to the drop operator should be column names predError, productId and value wrapped in the col operator so they should be expressed like drop(col(predError), col(productId), col(value)). E. The select operator should be replaced by the drop operator and the arguments to the drop operator should be column names predError, productId and value as strings.
(Correct)

Discussion 0

Correct Answer: E Vote an answer

Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).

Question 5

Which is the highest level in Spark's execution hierarchy?

A. Executor B. Slot C. Stage D. Job E. Task

Discussion 0

Correct Answer: D Vote an answer

Question 6

The code block displayed below contains an error. The code block should produce a DataFrame with color as the only column and three rows with color values of red, blue, and green, respectively.
Find the error.
Code block:
1.spark.createDataFrame([("red",), ("blue",), ("green",)], "color")
Instead of calling spark.createDataFrame, just DataFrame should be called.

A. The "color" expression needs to be wrapped in brackets, so it reads ["color"]. B. The colors red, blue, and green should be expressed as a simple Python list, and not a list of tuples. C. The commas in the tuples with the colors should be eliminated. D. Instead of color, a data type should be specified.

Discussion 0

Correct Answer: A Vote an answer

Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).

Question 7

The code block shown below should return a DataFrame with two columns, itemId and col. In this DataFrame, for each element in column attributes of DataFrame itemDf there should be a separate row in which the column itemId contains the associated itemId from DataFrame itemsDf. The new DataFrame should only contain rows for rows in DataFrame itemsDf in which the column attributes contains the element cozy.
A sample of DataFrame itemsDf is below.
Code block:
itemsDf.__1__(__2__).__3__(__4__, __5__(__6__))

A. 1. filter
2. "array_contains(attributes, 'cozy')"
3. select
4. "itemId"
5. map
6. "attributes" B. 1. filter
2. array_contains("cozy")
3. select
4. "itemId"
5. explode
6. "attributes" C. 1. filter
2. "array_contains(attributes, 'cozy')"
3. select
4. "itemId"
5. explode
6. "attributes" D. 1. where
2. "array_contains(attributes, 'cozy')"
3. select
4. itemId
5. explode
6. attributes E. 1. filter
2. "array_contains(attributes, cozy)"
3. select
4. "itemId"
5. explode
6. "attributes"

Discussion 0

Correct Answer: C Vote an answer

Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).

Question 8

Which of the following code blocks uses a schema fileSchema to read a parquet file at location filePath into a DataFrame?

A. spark.read.schema("fileSchema").format("parquet").load(filePath) B. spark.read().schema(fileSchema).parquet(filePath) C. spark.read.schema(fileSchema).open(filePath) D. spark.read.schema(fileSchema).format("parquet").load(filePath) E. spark.read().schema(fileSchema).format(parquet).load(filePath)

Discussion 0

Correct Answer: D Vote an answer

Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).

Question 9

Which of the following code blocks displays the 10 rows with the smallest values of column value in DataFrame transactionsDf in a nicely formatted way?

A. transactionsDf.sort(col("value").asc()).print(10) B. transactionsDf.orderBy("value").asc().show(10) C. transactionsDf.sort(col("value")).show(10) D. transactionsDf.sort(col("value").desc()).head() E. transactionsDf.sort(asc(value)).show(10)

Discussion 0

Correct Answer: C Vote an answer

Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).

Question 10

In which order should the code blocks shown below be run in order to return the number of records that are not empty in column value in the DataFrame resulting from an inner join of DataFrame transactionsDf and itemsDf on columns productId and itemId, respectively?
1. .filter(~isnull(col('value')))
2. .count()
3. transactionsDf.join(itemsDf, col("transactionsDf.productId")==col("itemsDf.itemId"))
4. transactionsDf.join(itemsDf, transactionsDf.productId==itemsDf.itemId, how='inner')
5. .filter(col('value').isnotnull())
6. .sum(col('value'))

A. 3, 5, 2 B. 3, 1, 2 C. 4, 1, 2 D. 4, 6 E. 3, 1, 6

Discussion 0

Correct Answer: C Vote an answer

Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).

Databricks Certified Associate Developer for Apache Spark 3.0 - Associate-Developer-Apache-Spark Exam Practice Test