This set of Data Warehousing MCQs covers advanced concepts of ETL Systems & Data Integration, including ELT pipelines, Change Data Capture (CDC), staging areas, incremental loading, snapshot fact tables, and enterprise ETL optimization techniques. Useful for GATE, UGC NET, University Semester Exams, and Big Data Technology preparation.
Topic: ETL Systems & Data Integration | Set: 4
Difficulty: Hard to Medium | Total Questions: 15
Important Data Warehousing MCQs with Answers
Q1. What is the primary operational shift when moving from an ETL (Extract, Transform, Load) model to an ELT (Extract, Load, Transform) model?
A. The transformation phase is completely removed from the data integration process
B. Raw data is loaded straight into the target data warehouse first, leveraging the warehouse’s compute power to execute transformations
C. Staging environments are forced to run inside the volatile memory of client desktop systems
D. Data must be completely encrypted before it can be extracted from source systems
View Answer & Explanation
Answer: B
Explanation: ELT shifts the resource-heavy transformation phase into the scalable target warehouse engine, processing raw data after ingest.
Q2. Which Change Data Capture (CDC) methodology provides the lowest performance impact on a live operational production source database (OLTP)?
A. Running comprehensive full-table comparison scans every hour
B. Database Transaction Log-Based CDC, which parses the database engine’s write-ahead logs asynchronously
C. Implementing synchronous triggers on every source table row insert/update query
D. Forcing operational users to log off the system during data extraction phases
View Answer & Explanation
Answer: B
Explanation: Log-based CDC reads the database transaction log offline or asynchronously, capturing mutations without adding overhead to live application tables.
Q3. Why do data warehouse integration pipelines utilize a dedicated “Staging Area”?
A. To provide a permanent public dashboard query environment for end business users
B. To isolate, clean, validate, and normalize raw extracted data without impacting production system or target warehouse performance
C. To store old archival tape database backups from twenty years ago
D. To accelerate the execution velocity of transaction routing firewalls
View Answer & Explanation
Answer: B
Explanation: Staging zones act as processing buffers where data is aggregated, cleaned, and scrubbed before it is safely written into the warehouse core.
Q4. What is the main objective of performing “Data Profiling” at the beginning of an ETL project?
A. Creating user access security profiles and assigning database passwords
B. Analytically examining source data to understand its structure, content, relationships, and data quality anomalies
C. Automatically writing the database schema code for the frontend dashboards
D. Deleting outdated database tables from the staging area repository
View Answer & Explanation
Answer: B
Explanation: Data profiling inspects raw source data attributes (null rates, range bounds, format variants) to align ETL cleansing logic with real-world inputs.
Q5. An ETL pipeline implements “Incremental Loading” (Delta Load) to:
A. Wipe out the entire data warehouse database and rebuild it from scratch every night
B. Identify and ingest only the new or modified source records that have accumulated since the last successful load window
C. Slow down the data processing speed across the distributed hardware framework
D. Convert all dimension tables into unindexed flat text files
View Answer & Explanation
Answer: B
Explanation: Incremental loads capture only new or updated records since the last execution run, shortening processing windows and saving system resources.
Q6. When an ETL process encounters a missing or null foreign key attribute in a source system record, how should it map the dimension link?
A. By completely deleting the entire fact transaction record to keep the table clean
B. By assigning the foreign key to a special pre-defined row in the dimension table representing “Unknown” or “Not Applicable”
C. By generating a random integer that points to a live customer profile record line
D. By halting the data warehouse cluster engine and triggering a global network reboot
View Answer & Explanation
Answer: B
Explanation: Pointing unmapped or missing source keys to a standardized “Unknown” row maintains referential integrity without losing fact records.
Q7. What is the difference between a “Direct Extraction” and an “Indirect Extraction” mechanism?
A. Direct extraction requires data to be converted into binary files before transport
B. Direct extraction queries the live source database directly; indirect extraction extracts data from intermediate export files or backups
C. Indirect extraction completely avoids using any form of metadata documentation
D. Direct extraction can only be executed against cloud-hosted data repository models
View Answer & Explanation
Answer: B
Explanation: Direct extraction reads data straight from live source schemas, while indirect extraction uses decoupled files to minimize system load.
Q8. What does the “High-Water Mark” technique track inside an ETL data integration control system?
A. The physical storage limit threshold of the data warehouse hard drive arrays
B. The maximum number of concurrent database users allowed to log into the reporting client
C. The last successfully processed timestamp or sequential identifier from a source system, guiding the next incremental load sweep
D. The count of index structures built across a multi-tiered snowflake design
View Answer & Explanation
Answer: C
Explanation: The high-water mark logs the point of the last successful data extract, giving the next batch run a precise starting point for new records.
Q9. In a relational data warehouse, why is referential integrity checking often “deferred” or handled by the ETL process rather than enforced by strict database constraints?
A. Relational data warehouse engines lack the technical capability to process standard foreign keys
B. Enforcing live row-by-row constraints at the database level severely bottlenecks bulk-loading data speeds
C. The Kimball methodology completely prohibits the use of database primary keys
D. Deferring integrity checks automatically encrypts the underlying data rows from view
View Answer & Explanation
Answer: B
Explanation: Handling constraint validations within the ETL pipeline avoids severe write bottlenecks caused by row-by-row foreign key checks during large bulk loads.
Q10. What constitutes an “Accumulating Snapshot Fact Table” design?
A. A fact table that logs a single row per transaction that is never modified after insertion
B. A row-per-entity fact table that tracks a defined lifecycle with multiple milestone dates, updating columns as progress occurs
C. A table that summarizes data at a static chronological interval, like the end of every business month
D. A dimension table that has been optimized with specialized bitmap index definitions
View Answer & Explanation
Answer: B
Explanation: Accumulating snapshots track indeterminate processing lifecycles by updating milestone date columns within a single row as events unfold.
Q11. A “Periodic Snapshot Fact Table” is uniquely optimized to capture:
A. Single, atomic real-time transaction occurrences as they happen second-by-second
B. A recurring, summarized slice of performance metrics at fixed intervals (e.g., end-of-month balances or inventory counts)
C. Text descriptions detailing customer profile address history shifts
D. System administration error log tracks from staging file integrations
View Answer & Explanation
Answer: B
Explanation: Periodic snapshots save uniform historical snapshots at set intervals, capturing time-series status trends like inventory balances.
Q12. What is the structural focus of a “Transaction Fact Table”?
A. Aggregating multi-year sales summaries into a single corporate data mart row
B. Capturing a discrete row for every individual business event or transactional scan at the lowest atomic level of detail
C. Storing system configuration rules and index mapping metadata records
D. Normalizing dimension attribute structures into nested sub-table hierarchies
View Answer & Explanation
Answer: B
Explanation: Transaction fact tables record events at the most granular level, providing a foundation for detailed analysis.
Q13. What does the process of “Data Enrichment” achieve inside an ETL workflow?
A. Erasing historical data rows to decrease physical disk space consumption
B. Enhancing internal datasets by appending descriptive context from external sources (such as mapping zip codes to demographic tiers)
C. Rewriting backend pipeline script code from SQL into alternative languages
D. Banning business users from running customized ad-hoc data query statements
View Answer & Explanation
Answer: B
Explanation: Enrichment layers extra value onto raw inputs during transformation by joining external, descriptive data streams.
Q14. Why do enterprise ETL architectures separate data transformation into a discrete layer rather than executing it inside source systems?
A. Source transactional systems are legally barred from performing basic join operations
B. Offloading transformation logic protects live production OLTP systems from performance degradation during business hours
C. Data warehouse database systems can only store clean binary unindexed structures
D. It eliminates the need to document technical metadata dictionary sheets
View Answer & Explanation
Answer: B
Explanation: Running complex analytical cleanups and aggregations on live transactional environments risks slowing down core customer operations.
Q15. What risk is encountered when an ETL job runs a massive bulk data load without disabling table indexing structures?
A. The database tables will be permanently deleted from the target storage system
B. Extreme load time delays as the database engine splits compute cycles to rebuild indexes row-by-row for millions of incoming inserts
C. The pipeline will automatically convert the star schema layout into a galaxy model
D. End users will be completely granted administrative access to backend database schemas
View Answer & Explanation
Answer: B
Explanation: Bulk-loading data into fully indexed tables bogs down performance because the engine must recalculate indexing paths for every row insertion.
Conclusion
These Data Warehousing MCQs questions covered advanced concepts such as ELT pipelines, CDC mechanisms, incremental loading, snapshot fact tables, data enrichment, and enterprise ETL optimization strategies.
Practicing these questions is useful for Big Data Technology, Enterprise ETL Systems, GATE CS, UGC NET, and university semester examinations.
For better understanding of theory and concepts refer to ETL Process In Data Warehouse.