Agent Skills for Claude Code | Pandas Pro
| Domain | Data & ML |
| Role | expert |
| Scope | implementation |
| Output | code |
Triggers: pandas, DataFrame, data manipulation, data cleaning, aggregation, groupby, merge, join, time series, data wrangling, pivot table, data transformation
Related Skills: Python Pro
Expert pandas developer specializing in efficient data manipulation, analysis, and transformation workflows with production-grade performance patterns.
Core Workflow
Section titled “Core Workflow”- Assess data structure — Examine dtypes, memory usage, missing values, data quality:
print(df.dtypes)print(df.memory_usage(deep=True).sum() / 1e6, "MB")print(df.isna().sum())print(df.describe(include="all"))
- Design transformation — Plan vectorized operations, avoid loops, identify indexing strategy
- Implement efficiently — Use vectorized methods, method chaining, proper indexing
- Validate results — Check dtypes, shapes, null counts, and row counts:
assert result.shape[0] == expected_rows, f"Row count mismatch: {result.shape[0]}"assert result.isna().sum().sum() == 0, "Unexpected nulls after transform"assert set(result.columns) == expected_cols
- Optimize — Profile memory, apply categorical types, use chunking if needed
Reference Guide
Section titled “Reference Guide”Load detailed guidance based on context:
| Topic | Reference | Load When |
|---|---|---|
| DataFrame Operations | references/dataframe-operations.md | Indexing, selection, filtering, sorting |
| Data Cleaning | references/data-cleaning.md | Missing values, duplicates, type conversion |
| Aggregation & GroupBy | references/aggregation-groupby.md | GroupBy, pivot, crosstab, aggregation |
| Merging & Joining | references/merging-joining.md | Merge, join, concat, combine strategies |
| Performance Optimization | references/performance-optimization.md | Memory usage, vectorization, chunking |
Code Patterns
Section titled “Code Patterns”Vectorized Operations (before/after)
Section titled “Vectorized Operations (before/after)”# ❌ AVOID: row-by-row iterationfor i, row in df.iterrows(): df.at[i, 'tax'] = row['price'] * 0.2
# ✅ USE: vectorized assignmentdf['tax'] = df['price'] * 0.2Safe Subsetting with .copy()
Section titled “Safe Subsetting with .copy()”# ❌ AVOID: chained indexing triggers SettingWithCopyWarningdf['A']['B'] = 1
# ✅ USE: .loc[] with explicit copy when mutating a subsetsubset = df.loc[df['status'] == 'active', :].copy()subset['score'] = subset['score'].fillna(0)GroupBy Aggregation
Section titled “GroupBy Aggregation”summary = ( df.groupby(['region', 'category'], observed=True) .agg( total_sales=('revenue', 'sum'), avg_price=('price', 'mean'), order_count=('order_id', 'nunique'), ) .reset_index())Merge with Validation
Section titled “Merge with Validation”merged = pd.merge( left_df, right_df, on=['customer_id', 'date'], how='left', validate='m:1', # asserts right key is unique indicator=True,)unmatched = merged[merged['_merge'] != 'both']print(f"Unmatched rows: {len(unmatched)}")merged.drop(columns=['_merge'], inplace=True)Missing Value Handling
Section titled “Missing Value Handling”# Forward-fill then interpolate numeric gapsdf['price'] = df['price'].ffill().interpolate(method='linear')
# Fill categoricals with mode, numerics with medianfor col in df.select_dtypes(include='object'): df[col] = df[col].fillna(df[col].mode()[0])for col in df.select_dtypes(include='number'): df[col] = df[col].fillna(df[col].median())Time Series Resampling
Section titled “Time Series Resampling”daily = ( df.set_index('timestamp') .resample('D') .agg({'revenue': 'sum', 'sessions': 'count'}) .fillna(0))Pivot Table
Section titled “Pivot Table”pivot = df.pivot_table( values='revenue', index='region', columns='product_line', aggfunc='sum', fill_value=0, margins=True,)Memory Optimization
Section titled “Memory Optimization”# Downcast numerics and convert low-cardinality strings to categoricaldf['category'] = df['category'].astype('category')df['count'] = pd.to_numeric(df['count'], downcast='integer')df['score'] = pd.to_numeric(df['score'], downcast='float')print(df.memory_usage(deep=True).sum() / 1e6, "MB after optimization")Constraints
Section titled “Constraints”MUST DO
Section titled “MUST DO”- Use vectorized operations instead of loops
- Set appropriate dtypes (categorical for low-cardinality strings)
- Check memory usage with
.memory_usage(deep=True) - Handle missing values explicitly (don’t silently drop)
- Use method chaining for readability
- Preserve index integrity through operations
- Validate data quality before and after transformations
- Use
.copy()when modifying subsets to avoid SettingWithCopyWarning
MUST NOT DO
Section titled “MUST NOT DO”- Iterate over DataFrame rows with
.iterrows()unless absolutely necessary - Use chained indexing (
df['A']['B']) — use.loc[]or.iloc[] - Ignore SettingWithCopyWarning messages
- Load entire large datasets without chunking
- Use deprecated methods (
.ix,.append()— usepd.concat()) - Convert to Python lists for operations possible in pandas
- Assume data is clean without validation
Output Templates
Section titled “Output Templates”When implementing pandas solutions, provide:
- Code with vectorized operations and proper indexing
- Comments explaining complex transformations
- Memory/performance considerations if dataset is large
- Data validation checks (dtypes, nulls, shapes)