Cognizant | Data Analyst
Cognizant Data Analyst Interview Questions
Cognizant Data Analyst interviews test SQL querying, Excel/Power BI dashboards, Python pandas, and basic statistics. Expect technical questions and scenario-based analytics problems from healthcare, retail, or BFSI domains.
Foundation Questions - Guaranteed to Appear
1
How do you clean a dataset with missing values and outliers in Python pandas?
Identify nulls: df.isnull().sum(). Fill numeric nulls with median (robust to outliers): df['salary'].fillna(df['salary'].median(), inplace=True). Drop rows where critical column is null: df.dropna(subset=['customer_id'], inplace=True). IQR outlier detection:
Q1 = df['revenue'].quantile(0.25)
Q3 = df['revenue'].quantile(0.75)
IQR = Q3 - Q1
df = df[(df['revenue'] >= Q1 - 1.5*IQR) & (df['revenue'] <= Q3 + 1.5*IQR)]
Document why median over mean (skewed distribution).
2
What is the difference between VLOOKUP and INDEX-MATCH in Excel?
VLOOKUP requires the lookup column to be leftmost in the range — fragile when columns are inserted. Formula: =VLOOKUP(A2, B:D, 3, FALSE). INDEX-MATCH: INDEX returns value at position, MATCH finds that position — lookup column can be anywhere. Formula: =INDEX(D:D, MATCH(A2, B:B, 0)). At Cognizant, always use INDEX-MATCH for production dashboards because clients often modify column order. In Excel 365, XLOOKUP replaces both — supports left-lookup, returns arrays, and handles not-found values natively.
3
Explain the difference between correlation and causation. Give a business example.
Correlation measures the statistical relationship between two variables (Pearson r: -1 to +1). Causation means one variable directly causes the other. Example: customers who received promotional emails had higher purchase rates (correlation) but high-value customers were already opted in — this is selection bias, not causation. A proper A/B test with random group assignment is the only way to establish causation. Measure incremental lift: email group vs no-email group, randomly assigned.
4
How do you validate the quality of a data pipeline's output?
Data quality checks I implement: 1) Row count validation — source count vs target count within tolerance. 2) Null checks — critical columns like customer_id and transaction_date cannot have nulls. 3) Referential integrity — all foreign keys in fact table must exist in dimension tables. 4) Range checks — amounts and dates within expected bounds. 5) Duplicate checks — unique keys must be unique. 6) Business rule validation — pipeline total equals source system report total. Implement as automated SQL/Python checks writing results to a data_quality_log table with PagerDuty alerts on failures.
5
What is A/B testing and how would you analyze results for an e-commerce client?
A/B testing randomly splits users into Control (existing experience) and Variant (new design) to measure causal impact. Analysis: 1) Define metric: conversion rate = orders/sessions. 2) Calculate minimum sample size via power analysis (80% power, 5% significance). 3) Run test until sample size met — no peeking early. 4) Test significance with two-proportion z-test or chi-square. 5) Report: 'B improved conversion by 12% +/- 3% (95% CI).' Segment results by device type — mobile users often behave very differently than desktop users.
6
Write pandas code to find top 3 products by revenue for each region.
top3 = (
df.groupby(['region', 'product'])['revenue']
.sum().reset_index()
.sort_values(['region', 'revenue'], ascending=[True, False])
.groupby('region').head(3)
.reset_index(drop=True)
)
Alternative using rank():
df['rnk'] = df.groupby('region')['revenue'].rank(method='dense', ascending=False)
result = df[df['rnk'] <= 3]
At Cognizant, always display percentage contribution alongside absolute revenue for business context.
Practice With Live AI Interview Simulator
GhostMode AI simulates real Cognizant interviewers - ask follow-ups, get scored, and receive feedback on your answers in real-time.
Start AI Mock Interview
Start Free Prep