Accenture | Data Engineer

Accenture Data Engineer Interview Questions

Accenture Data Engineer interviews focus on cloud platforms (Azure, AWS), ETL pipeline design, SQL optimization, and Python scripting. Expect 2-3 technical rounds with scenario-based questions from project leads.

30+
Real Questions
2026
Updated
AI
Live Practice
Foundation Questions - Guaranteed to Appear
1
How would you design an ETL pipeline in Azure Data Factory for a retail client at Accenture?
Design a medallion-layer pipeline: raw POS data lands in Azure Data Lake Gen2 (Bronze). ADF Mapping Data Flow cleanses and deduplicates (Silver). Synapse Spark aggregates daily KPIs by store and SKU into the Gold layer consumed by Power BI. Parameterize with global parameters to serve multiple clients with different source schemas.
2
What is the difference between INNER JOIN, LEFT JOIN, and CROSS JOIN?
INNER JOIN returns only matching rows in both tables. LEFT JOIN returns all rows from the left table plus matched rows from right (NULL for non-matches) — used for orphan detection audits. CROSS JOIN produces a Cartesian product — rarely in production, useful for seed data generation.
3
Explain partitioning strategies in Apache Spark.
Hash Partitioning distributes rows by hash(key) mod N — best for joins since matching keys land in the same partition. Range Partitioning sorts by key range — ideal for time-series processing. Custom partitionBy lets you partition by business key like date or region — critical for Parquet files read via Hive metastore.
4
How do you handle Slowly Changing Dimensions (SCD)?
SCD Type 1: overwrite — no history. SCD Type 2: add new row with start/end dates and current_flag — used for customer address history where revenue attribution matters. SCD Type 3: adds a previous_value column — only one level of history. Implement Type 2 in ADF using Alter Row transformation with upsert condition on surrogate keys.
5
What are window functions in SQL? Give a practical example.
Window functions compute values across related rows without collapsing them like GROUP BY. Example — running total by salesperson: SUM(sales_amount) OVER (PARTITION BY salesperson_id ORDER BY sale_month). Common at Accenture: ROW_NUMBER() for deduplication, LAG/LEAD for month-over-month change, NTILE() for percentile ranking.
6
How would you optimize a slow Python ETL script processing 10M records?
Replace row-by-row iteration with vectorized NumPy operations (100x speedup). Use pd.read_csv(chunksize=100000) for chunked reading to avoid OOM. Push transforms to DB via SQLAlchemy bulk inserts. Parallelize with concurrent.futures.ThreadPoolExecutor for I/O-bound or ProcessPoolExecutor for CPU-bound tasks. For huge data, migrate to PySpark on Azure Databricks.

Practice With Live AI Interview Simulator

GhostMode AI simulates real Accenture interviewers - ask follow-ups, get scored, and receive feedback on your answers in real-time.

Start AI Mock Interview Start Free Prep