This set of Apache Hive MCQs covers important concepts of Apache Hive, including HiveQL, Metastore, partitioning, bucketing, managed and external tables, data types, and Hive architecture. Useful for GATE, UGC NET, University Semester Exams, and Big Data Technology preparation.
Topic: Big Data (Apache Hive) | Set: 1
Difficulty: Easy to Medium | Total Questions: 15
Important Apache Hive MCQs with Answers
Q1. What is Apache Hive primarily designed for?
A. Storing small, real-time transactional data
B. Performing analytical queries on large datasets stored in HDFS
C. Managing cluster-wide hardware resources
D. Replacing HDFS as a storage layer
View Answer & Explanation
Answer: B
Explanation: Hive is a data warehouse software that facilitates reading, writing, and managing large datasets residing in distributed storage using SQL.
Q2. Which language does Hive use to query data?
A. Pig Latin
B. HiveQL (HQL)
C. Java
D. Scala
View Answer & Explanation
Answer: B
Explanation: HiveQL is a query language very similar to SQL, designed specifically to work with Hive.
Q3. What is the primary role of the Hive Metastore?
A. Storing the actual data rows
B. Storing metadata about tables, columns, and partitions
C. Executing MapReduce jobs
D. Managing YARN containers
View Answer & Explanation
Answer: B
Explanation: The Metastore is a database (typically MySQL or Derby) that holds the schema and location information for Hive tables.
Q4. Which component compiles HiveQL into execution plans?
A. The Driver
B. The Client
C. The DataNode
D. The NameNode
View Answer & Explanation
Answer: A
Explanation: The Hive Driver receives the query, parses it, and uses the compiler to create an execution plan (e.g., MapReduce or Tez).
Q5. What is the default file format for a Hive table?
A. ORC
B. Parquet
C. TextFile
D. Avro
View Answer & Explanation
Answer: C
Explanation: By default, if not specified, Hive stores data as raw text files with row-based storage.
Q6. What is a “Managed Table” in Hive?
A. A table where Hive controls the data lifecycle and location
B. A table that only stores metadata but no data
C. A table that is always updated in real-time
D. A table that cannot be queried
View Answer & Explanation
Answer: A
Explanation: Managed (Internal) tables store data in the Hive warehouse directory, and dropping the table deletes both schema and data.
Q7. Which command is used to see the list of tables in a Hive database?
A. SHOW TABLES;
B. LIST TABLES;
C. GET TABLES;
D. SELECT TABLES;
View Answer & Explanation
Answer: A
Explanation: SHOW TABLES is the standard HiveQL command to list all tables in the currently selected database.
Q8. What is “Partitioning” in Hive used for?
A. Compressing data to save space
B. Dividing a table into sub-directories based on column values to improve query performance
C. Creating a copy of the table on a different cluster
D. Encrypting column data
View Answer & Explanation
Answer: B
Explanation: Partitioning optimizes queries by allowing Hive to skip scanning directories that do not match the partition filter.
Q9. What is the primary difference between a View and a Table in Hive?
A. Tables store data; Views are virtual and store only a query definition
B. Views are faster than tables
C. Tables are stored in memory only
D. Views store physical data on disk
View Answer & Explanation
Answer: A
Explanation: A View is a virtual table that executes its stored query every time it is accessed, whereas a Table contains actual stored data.
Q10. Which data type in Hive represents a single true/false value?
A. TINYINT
B. BOOLEAN
C. STRING
D. BINARY
View Answer & Explanation
Answer: B
Explanation: The BOOLEAN data type in Hive stores true or false values.
Q11. What is the purpose of the EXTERNAL TABLE keyword?
A. To prevent the table from being queried
B. To create a table where the data is located outside the Hive warehouse directory
C. To force the table to use MapReduce
D. To encrypt the table
View Answer & Explanation
Answer: B
Explanation: External tables allow Hive to query data that exists in a location outside of the Hive warehouse, so dropping the table does not delete the source data.
Q12. Which interface is used to run Hive queries from a command line?
A. Hive CLI / Beeline
B. HDFS Shell
C. YARN Web UI
D. MapReduce JobTracker
View Answer & Explanation
Answer: A
Explanation: The Hive Command Line Interface (CLI) or Beeline (via HiveServer2) are standard ways to interact with Hive.
Q13. Can Hive handle complex data types?
A. No, only primitive types
B. Yes, including Arrays, Maps, and Structs
C. Only if it is stored in JSON
D. Only via custom Java functions
View Answer & Explanation
Answer: B
Explanation: Hive supports complex types like ARRAY, MAP, and STRUCT to handle nested data structures.
Q14. What is a “Bucket” in Hive?
A. A physical folder for partitions
B. A technique to decompose table data into smaller, more manageable files based on hash values
C. A user-defined function
D. A configuration for the Metastore
View Answer & Explanation
Answer: B
Explanation: Bucketing organizes data into fixed numbers of files based on a column hash, which improves join performance and sampling.
Q15. What happens when you perform a DROP TABLE on an External Table?
A. The table metadata and the underlying data are both deleted
B. Only the table metadata is removed; the data stays in HDFS
C. An error is thrown
D. The data is moved to the trash
View Answer & Explanation
Answer: B
Explanation: Since an external table does not “own” the data, Hive only deletes the reference (metadata) in the Metastore.
Conclusion
These Apache Hive MCQs questions covered important concepts such as HiveQL, Metastore, partitioning, bucketing, managed and external tables, data types, and Hive architecture in Hadoop.
Practicing these questions is useful for Big Data Technology, Hadoop ecosystem learning, GATE CS, UGC NET, and university semester examinations.
For better understanding of theory and concepts, refer to Apache Hive