Capgemini | Python Data Analyst

Capgemini Python and Data Analyst Interview Questions

Capgemini Python/Data Analyst interviews test pandas data manipulation, SQL querying, visualization (matplotlib/seaborn), basic ML concepts, and business analytics scenarios. Expect a coding test followed by 2 technical interviews and one HR round.

30+

Real Questions

2026

Updated

Live Practice

Foundation Questions - Guaranteed to Appear

What is the difference between merge(), join(), and concat() in pandas?

pd.concat(): stacks DataFrames vertically or horizontally — like SQL UNION ALL, no key matching, aligns by index or column position. df.merge(): combines DataFrames based on a common column or index — like SQL JOIN. Supports inner, left, right, outer joins. df.join(): shorthand for merging on index — optimized for index-to-index joins. Most common follow-up: merge with different key names using left_on='cust_id', right_on='customer_id' parameters in merge().

Explain statistical significance and p-value in simple terms.

P-value is the probability of observing your data if the null hypothesis (no effect exists) were true. If p < 0.05, we reject H0 — the result is statistically significant at 5% level. Business translation: 'We ran an A/B test. The new checkout button increased conversions by 8%. P-value = 0.03 means there is only a 3% chance this improvement is random noise — we should implement it.' Never say p < 0.05 proves the effect — it means evidence is strong enough to act on.

How would you visualize sales trends for a retail client using Python?

import matplotlib.pyplot as plt import seaborn as sns # Monthly revenue trend line chart monthly = df.groupby('month')['revenue'].sum().reset_index() plt.figure(figsize=(12,5)) plt.plot(monthly['month'], monthly['revenue'], marker='o', linewidth=2, color='#667eea') plt.fill_between(monthly['month'], monthly['revenue'], alpha=0.1, color='#667eea') plt.title('Monthly Revenue Trend 2026', fontsize=16, fontweight='bold') plt.tight_layout(); plt.savefig('revenue_trend.png', dpi=300) # Heatmap for category x region pivot = df.pivot_table(values='revenue', index='category', columns='region', aggfunc='sum') sns.heatmap(pivot, annot=True, fmt='.0f', cmap='YlOrRd')

What is linear regression and how would you interpret its output?

Linear regression models the relationship: Y = beta0 + beta1*X1 + beta2*X2 + error. Interpreting output: Coefficients (beta): for every 1-unit increase in X, Y changes by beta units holding other variables constant. R-squared: proportion of variance in Y explained by the model (0.85 means 85% of sales variation explained). P-values for coefficients test statistical significance of each predictor. At Capgemini client presentations: translate beta1 = 2500 as 'for each additional Rs 1000 spent on digital ads, revenue increases by Rs 2500 — a 2.5x ROI.'

Write a Python function to calculate customer churn rate and average revenue per churned customer.

def analyze_churn(df): # df columns: customer_id, status ('active'/'churned'), monthly_revenue total_customers = len(df) churned = df[df['status'] == 'churned'] return { 'churn_rate_percent': round(len(churned) / total_customers * 100, 2), 'avg_revenue_churned': round(churned['monthly_revenue'].mean(), 2), 'total_revenue_at_risk': round(churned['monthly_revenue'].sum(), 2), 'churned_count': len(churned) } result = analyze_churn(df) print(f'Churn Rate: {result["churn_rate_percent"]}%') At Capgemini, also segment churn by acquisition channel and product tier to identify root causes.

What is the difference between supervised and unsupervised learning? Give a business example.

Supervised learning trains on labeled data — predict customer churn (binary classification) or next quarter revenue (regression). Unsupervised learning finds hidden patterns in unlabeled data — K-Means customer segmentation groups customers by purchase behavior without predefined categories (identifies high-value, discount-seekers, and loyal segments). Apriori algorithm for market basket analysis finds product associations (customers buying X also buy Y). Capgemini pipeline: use unsupervised clustering to discover customer segments, then supervised classification to predict which new customers will join the high-value segment.

Practice With Live AI Interview Simulator

GhostMode AI simulates real Capgemini interviewers - ask follow-ups, get scored, and receive feedback on your answers in real-time.

Start AI Mock Interview Start Free Prep

Capgemini Python and Data Analyst Interview Questions

Practice With Live AI Interview Simulator

More Company-Specific Interview Guides