Cloudera CDP Data Engineer - Certification - CDP-3002 Exam Practice Test
In an Airflow DAG, you have tasks A, B, C, and D. Task A must complete before B and C can start, but B and C can run in parallel. Task D should only run once both B and C have completed. How do you set up these dependencies?
Correct Answer: D
Vote an answer
Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).
For a Hive table that is both partitioned and bucketed, what considerations must be taken into account to optimize a join query involving this table?
Correct Answer: C
Vote an answer
Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).
You're working with a CSV file containing missing dat
a. How can you efficiently handle missing values in a Spark DataFrame created from this file?
a. How can you efficiently handle missing values in a Spark DataFrame created from this file?
Correct Answer: C
Vote an answer
Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).
What are the potential trade-offs to consider when using checkpointing in Spark applications?
Correct Answer: A,B
Vote an answer
Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).
In Apache Spark, which of the following is the most effective strategy for minimizing data shuffling across nodes in a cluster?
Correct Answer: B
Vote an answer
Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).
Which of the following is true about persisting RDDs in Apache Spark?
A Persisting an RDD in memory allows for faster access but increases the risk of data loss.
A Persisting an RDD in memory allows for faster access but increases the risk of data loss.
Correct Answer: C
Vote an answer
Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).
You are writing a PySpark application where you need to collect the final results from various Executors and present them to the user. Which aspect of the Spark Driver's role is primarily involved in this process?
Correct Answer: B
Vote an answer
Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).
You are working with a large, skewed dataset in Spark. How would you optimize processing to mitigate the impact of skew and improve performance?
Correct Answer: B
Vote an answer
Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).
You have deployed a Spark application on Kubernetes, which is experiencing intermittent failures. To improve fault tolerance, you decide to implement checkpointing. Which of the following is the best approach to add checkpointing in a PySpark application?
Correct Answer: A
Vote an answer
Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).
You want to debug an issue within your Spark application that interacts with Hive tables. What tools and techniques can you employ for effective debugging?
Correct Answer: B,D
Vote an answer
Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).
In Spark, when is it most beneficial to use the repartitionByRange method?
Correct Answer: C
Vote an answer
Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).
You need to create a new Hive table from a Spark DataFrame. What are the different approaches you can consider?
Correct Answer: B
Vote an answer
Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).
How does Spark handle data shuffling during distributed processing?
Correct Answer: D
Vote an answer
Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).
What happens when a task in Airflow is marked as "skipped"?
Correct Answer: D
Vote an answer
Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).
In a CI/CD pipeline, what is a key consideration when integrating Cloudera Data Engineering (CDE. service API calls for deploying Spark jobs, specifically regarding security?
Correct Answer: C
Vote an answer
Explanation: Only visible for PassTestking members. You can sign-up / login (it's free).