This set of Important Apache Hive MCQs covers advanced concepts of Apache Hive, including Tez, Cost-Based Optimizer (CBO), bucket map joins, ACID transactions, dynamic partitioning, query optimization, and advanced Hive architecture. Useful for GATE, UGC NET, University Semester Exams, and Big Data Technology preparation.
Topic: Big Data (Apache Hive) | Set: 3
Difficulty: Hard | Total Questions: 15
Important Apache Hive MCQs with Answers
Q1. What is the difference between SORT BY and ORDER BY?
A. ORDER BY sorts within each reducer; SORT BY sorts the entire result set
B. ORDER BY performs a total sort across all reducers; SORT BY sorts per reducer
C. There is no difference
D. ORDER BY is faster
View Answer & Explanation
Answer: B
Explanation: ORDER BY guarantees total order by forcing all data to a single reducer, while SORT BY performs a parallel sort per reducer.
Q2. Which configuration parameter is critical for tuning the “Reducer count” in a cluster?
A. hive.exec.reducers.max
B. mapreduce.job.reduces
C. hive.exec.reducers.bytes.per.reducer
D. Both B and C
View Answer & Explanation
Answer: C
Explanation: Tuning bytes.per.reducer is the standard way to control the output file size and number of reducers based on input data size.
Q3. What is “Tez” in the context of Hive?
A. A new programming language
B. An execution engine that creates a DAG (Directed Acyclic Graph) of tasks instead of linear MapReduce jobs
C. A database for storing metadata
D. A GUI for Hive
View Answer & Explanation
Answer: B
Explanation: Tez improves performance by reusing containers and minimizing the number of times data is written to HDFS between stages.
Q4. How do you handle “NULL” values in Hive when performing joins?
A. Hive handles them automatically
B. By using COALESCE or adding a specific condition to the join clause
C. You cannot join on NULL
D. They are ignored by default
View Answer & Explanation
Answer: B
Explanation: Hive join behavior with NULL values can be tricky; using COALESCE to turn NULLs into a dummy value is a common best practice.
Q5. What is the function of the “Cost-Based Optimizer (CBO)”?
A. It calculates the financial cost of running a query
B. It uses statistics to choose the best join order and physical operators
C. It optimizes the HDFS storage space
D. It manages security roles
View Answer & Explanation
Answer: B
Explanation: CBO optimizes query performance by reordering joins and choosing better scan/filter operations based on table statistics.
Q6. Which of these is NOT a valid Hive storage handler?
A. HBaseStorageHandler
B. JDBCStorageHandler
C. ORCStorageHandler
D. HDFSStorageHandler
View Answer & Explanation
Answer: D
Explanation: While Hive interacts with HDFS, HDFSStorageHandler is not a standard storage handler name (HDFS is the default underlying store).
Q7. What does BUCKET MAP JOIN optimize?
A. Join performance when both tables are bucketed on the join key
B. Partition scans
C. Group By performance
D. Sort performance
View Answer & Explanation
Answer: A
Explanation: If both tables are bucketed on the same join key, Hive can perform a join by only loading matching buckets, significantly saving memory.
Q8. Why is DISTRIBUTE BY used?
A. To force rows with the same key to go to the same reducer
B. To order the entire dataset
C. To limit the number of output files
D. To enable dynamic partitioning
View Answer & Explanation
Answer: A
Explanation: DISTRIBUTE BY ensures that data is partitioned across reducers based on a column, often used before a SORT BY to control data flow.
Q9. What is the “Thrift Service” bottleneck limitation?
A. It is CPU-bound
B. It is single-threaded (older versions) or resource-constrained
C. It cannot handle HDFS
D. It is only for read-only queries
View Answer & Explanation
Answer: B
Explanation: In older versions, the Hive Thrift server had threading limits that impacted high concurrency, leading to the development of HiveServer2.
Q10. How do you perform a “Lateral View” in Hive?
A. It is used in conjunction with UDTFs (like explode) to flatten arrays/maps into rows
B. It is a type of Join
C. It is used to rename tables
D. It is used for partitioning
View Answer & Explanation
Answer: A
Explanation: LATERAL VIEW is essential for expanding nested arrays or maps into rows so they can be joined or queried like standard relational tables.
Q11. What happens if hive.fetch.task.conversion is set to more?
A. All queries run in MapReduce
B. Simple queries (like SELECT *) are executed directly on the client without MapReduce
C. It increases query speed significantly
D. It disables the Metastore
View Answer & Explanation
Answer: B
Explanation: This optimization allows Hive to bypass the overhead of creating MapReduce/Tez jobs for simple filtering or projection tasks.
Q12. What is the “Union” operator constraint in older Hive versions?
A. It can only be used with 2 tables
B. It required subqueries; it could not be used directly in the main FROM clause
C. It does not support UNION ALL
D. It is not supported
View Answer & Explanation
Answer: B
Explanation: Historically, Hive required UNION ALL to be part of a subquery, though modern versions have relaxed this constraint.
Q13. What is “Transactionality” in Hive (ACID)?
A. Hive does not support ACID
B. Enabled for ORC tables; allows INSERT/UPDATE/DELETE/MERGE operations
C. Enabled for all file formats
D. Only for memory tables
View Answer & Explanation
Answer: B
Explanation: ACID support in Hive allows for transactional row-level updates, but it requires the ORC file format and specifically configured settings.
Q14. What is the impact of “Dynamic Partitioning” on HDFS?
A. It creates many small files, which can strain the NameNode
B. It creates large, consolidated files
C. It reduces disk usage
D. It makes queries slower
View Answer & Explanation
Answer: A
Explanation: Improper use of dynamic partitioning creates one file per partition per reducer, which can lead to thousands of small files, creating a “small file problem” for HDFS.
Q15. How do you view the execution plan of a Hive query?
A. SHOW PLAN;
B. EXPLAIN [EXTENDED] query;
C. DESCRIBE FORMATTED query;
D. DEBUG query;
View Answer & Explanation
Answer: B
Explanation: The EXPLAIN command shows the Abstract Syntax Tree and the physical execution plan (Map/Reduce stages).
Conclusion
These Hive MCQ questions covered advanced concepts such as Tez execution engine, Cost-Based Optimizer (CBO), bucket map joins, ACID transactions, dynamic partitioning, and advanced query optimization in Hive.
Practicing these questions is useful for Big Data Technology, Hadoop ecosystem learning, GATE CS, UGC NET, and university semester examinations.
For better understanding of theory and concepts, refer to Apache Hive