Important Apache Hive MCQs with Answers (Set 4) | Big Data Technology

This set of Apache Hive MCQs covers advanced concepts of Apache Hive, including query optimization, ORC files, partition pruning, bucketing, small file handling, Hive indexes, and performance tuning techniques in Hive. Useful for GATE, UGC NET, University Semester Exams, and Big Data Technology preparation.

Topic: Big Data (Apache Hive) | Set: 4

Difficulty: Hard to Medium | Total Questions: 15


Apache Hive MCQs with Answers

Q1. Which operation in Hive is traditionally the most expensive?

A. SELECT
B. JOIN (on large datasets)
C. DESCRIBE
D. SHOW TABLES

View Answer & Explanation

Answer: B

Explanation: Joins require shuffling data across the network, which is the most I/O and network-intensive operation in distributed systems.


Q2. Why are ORC files superior for large queries?

A. They support better compression and use column-level stats to skip stripes
B. They are plain text
C. They are easier to read
D. They use less CPU

View Answer & Explanation

Answer: A

Explanation: Columnar storage allows Hive to read only the columns needed, and striping allows skipping blocks (stripes) that don’t match the query filter.


Q3. How do you handle a “Cartesian Product” in Hive?

A. Hive throws an error by default unless hive.mapred.mode is set to nonstrict
B. It is always allowed
C. It is never allowed
D. It is faster than a join

View Answer & Explanation

Answer: A

Explanation: Cartesian products are usually accidental and highly expensive, so Hive blocks them unless explicitly allowed by the user.


Q4. What is the “Hive Warehouse Connector”?

A. A tool to connect Hive to Spark
B. A tool to connect Hive to a web browser
C. A tool to connect Hive to local files
D. A security tool

View Answer & Explanation

Answer: A

Explanation: The Hive Warehouse Connector (HWC) allows Apache Spark to read and write to Hive Managed/ACID tables efficiently.


Q5. What is the role of hive.stats.autogather?

A. To automatically gather statistics during data loading
B. To delete logs
C. To update the cluster firmware
D. To list users

View Answer & Explanation

Answer: A

Explanation: Enabling this property ensures statistics are updated during INSERT operations, keeping the CBO metadata current.


Q6. When is GROUP BY better than DISTINCT?

A. For counting unique values
B. When data is already sorted
C. For large datasets because GROUP BY can use combiner optimization
D. Never

View Answer & Explanation

Answer: C

Explanation: GROUP BY can utilize map-side combiners to partially aggregate data, whereas DISTINCT often forces more data to reducers.


Q7. What is the “Small File Problem” in HDFS?

A. Large files take too much space
B. Too many small files exhaust the NameNode’s memory due to metadata overhead
C. Small files cannot be read
D. Small files are faster

View Answer & Explanation

Answer: B

Explanation: Every file, block, and directory takes up memory in the NameNode; an excessive number of small files can crash the NameNode.


Q8. How do you merge small files in Hive?

A. Using ALTER TABLE … CONCATENATE (for ORC) or a re-insert
B. Using HDFS delete
C. You cannot merge them
D. Using a shell script

View Answer & Explanation

Answer: A

Explanation: The CONCATENATE command or performing an INSERT OVERWRITE into the same table are common strategies to compact small files into fewer, larger blocks.


Q9. What does set hive.enforce.bucketing=true; do?

A. It forces Hive to respect the number of buckets when writing data
B. It deletes buckets
C. It renames buckets
D. It creates partitions

Answer: A

View Answer & Explanation

Explanation: Without this setting, Hive might ignore the bucket count; setting it to true ensures the output directory structure matches the bucket definition.


Q10. What is a “Hive Index”?

A. A database-style index to speed up column lookups
B. A way to store tables in a specific order
C. A compression format
D. A security feature

View Answer & Explanation

Answer: A

Explanation: Hive indexes can be used to speed up queries, though they are often less popular than partitioning/bucketing due to maintenance overhead.


Q11. Which property controls the “Join” memory limit?

A. hive.auto.convert.join.noconditionaltask.size
B. hive.join.memory.limit
C. mapred.child.java.opts
D. hive.memory.size

View Answer & Explanation

Answer: A

Explanation: This property defines the threshold for the size of a table to be held in memory for a Map-side (Broadcast) join.


Q12. What happens to “unused” partitions in a query?

A. They are read but filtered later
B. Hive uses “Partition Pruning” to skip them entirely, saving I/O
C. They are deleted
D. They cause errors

View Answer & Explanation

Answer: B

Explanation: Partition pruning is a key optimization that ensures only the necessary directory paths are opened during a scan.


Q13. What is the primary benefit of ORC over Parquet for Hive?

A. They are the same
B. ORC is native to Hive and often has better compression/indexing support for Hive-specific features
C. Parquet cannot be used in Hive
D. ORC is plain text

View Answer & Explanation

Answer: B

Explanation: While both are excellent, ORC was specifically designed for Hive, offering tighter integration and metadata support.


Q14. Can Hive queries be scheduled?

A. Only using Cron
B. Yes, using external tools like Apache Oozie or Airflow
C. Hive has a built-in scheduler
D. No, they must be manual

View Answer & Explanation

Answer: B

Explanation: Hive does not have an internal job scheduler, so tools like Oozie, Airflow, or Cron are used for automation.


Q15. What is the “Hive Metastore Service”?

A. A separate Java service that provides the schema interface to the database
B. The database itself
C. A shell script
D. A MapReduce job

View Answer & Explanation

Answer: A

Explanation: The Metastore Service acts as a Thrift-based middleware that allows Hive to communicate with the backing database (MySQL/PostgreSQL) safely.


Conclusion

These Hive MCQs questions covered advanced concepts such as ORC optimization, partition pruning, bucketing enforcement, Hive indexes, small file handling, query optimization, and advanced Hive performance tuning.

Practicing these questions is useful for Big Data Technology, Hadoop ecosystem learning, GATE CS, UGC NET, and university semester examinations.

For better understanding of concepts and architecture, refer Apache Hive.


What’s Next?

Prepare smarter with PrepIt 📚

Subscribe to receive new MCQ sets, exam practice questions, semester resources, and technical interview preparation updates directly in your inbox.

We don’t spam! Read our privacy policy for more info.

Leave a Reply

Your email address will not be published. Required fields are marked *