Databricks Certified Associate Developer for Apache Spark 3.0 - Associate-Developer-Apache-Spark Exam Practice Test
Which of the following code blocks shuffles DataFrame transactionsDf, which has 8 partitions, so that it has
10 partitions?
10 partitions?
Correct Answer: D
Vote an answer
Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).
The code block shown below should return a one-column DataFrame where the column storeId is converted to string type. Choose the answer that correctly fills the blanks in the code block to accomplish this.
transactionsDf.__1__(__2__.__3__(__4__))
transactionsDf.__1__(__2__.__3__(__4__))
Correct Answer: C
Vote an answer
Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).
The code block displayed below contains an error. The code block should return a new DataFrame that only contains rows from DataFrame transactionsDf in which the value in column predError is at least 5. Find the error.
Code block:
transactionsDf.where("col(predError) >= 5")
Code block:
transactionsDf.where("col(predError) >= 5")
Correct Answer: C
Vote an answer
Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).
The code block displayed below contains an error. The code block is intended to return all columns of DataFrame transactionsDf except for columns predError, productId, and value. Find the error.
Excerpt of DataFrame transactionsDf:
transactionsDf.select(~col("predError"), ~col("productId"), ~col("value"))
Excerpt of DataFrame transactionsDf:
transactionsDf.select(~col("predError"), ~col("productId"), ~col("value"))
Correct Answer: E
Vote an answer
Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).
Which is the highest level in Spark's execution hierarchy?
Correct Answer: D
Vote an answer
The code block displayed below contains an error. The code block should produce a DataFrame with color as the only column and three rows with color values of red, blue, and green, respectively.
Find the error.
Code block:
1.spark.createDataFrame([("red",), ("blue",), ("green",)], "color")
Instead of calling spark.createDataFrame, just DataFrame should be called.
Find the error.
Code block:
1.spark.createDataFrame([("red",), ("blue",), ("green",)], "color")
Instead of calling spark.createDataFrame, just DataFrame should be called.
Correct Answer: A
Vote an answer
Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).
The code block shown below should return a DataFrame with two columns, itemId and col. In this DataFrame, for each element in column attributes of DataFrame itemDf there should be a separate row in which the column itemId contains the associated itemId from DataFrame itemsDf. The new DataFrame should only contain rows for rows in DataFrame itemsDf in which the column attributes contains the element cozy.
A sample of DataFrame itemsDf is below.
Code block:
itemsDf.__1__(__2__).__3__(__4__, __5__(__6__))
A sample of DataFrame itemsDf is below.
Code block:
itemsDf.__1__(__2__).__3__(__4__, __5__(__6__))
Correct Answer: C
Vote an answer
Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).
Which of the following code blocks uses a schema fileSchema to read a parquet file at location filePath into a DataFrame?
Correct Answer: D
Vote an answer
Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).
Which of the following code blocks displays the 10 rows with the smallest values of column value in DataFrame transactionsDf in a nicely formatted way?
Correct Answer: C
Vote an answer
Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).
In which order should the code blocks shown below be run in order to return the number of records that are not empty in column value in the DataFrame resulting from an inner join of DataFrame transactionsDf and itemsDf on columns productId and itemId, respectively?
1. .filter(~isnull(col('value')))
2. .count()
3. transactionsDf.join(itemsDf, col("transactionsDf.productId")==col("itemsDf.itemId"))
4. transactionsDf.join(itemsDf, transactionsDf.productId==itemsDf.itemId, how='inner')
5. .filter(col('value').isnotnull())
6. .sum(col('value'))
1. .filter(~isnull(col('value')))
2. .count()
3. transactionsDf.join(itemsDf, col("transactionsDf.productId")==col("itemsDf.itemId"))
4. transactionsDf.join(itemsDf, transactionsDf.productId==itemsDf.itemId, how='inner')
5. .filter(col('value').isnotnull())
6. .sum(col('value'))
Correct Answer: C
Vote an answer
Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).