← Back to Case Studies
Excel-to-Database Processing Pipeline
Case Study — Batch Data Processing
Role: Full-Stack Developer
Duration: 3 months
Team: 2 developers
Status: In production
The Challenge
A company's operations team was manually importing Excel spreadsheets into their database — copy-pasting rows, reformatting data, fixing inconsistencies, and running validation by hand. This process took 4-6 hours per import, happened 3-4 times per week, and was error-prone. Data inconsistencies from manual entry were causing downstream reporting issues.
The Approach
I designed a fully automated batch processing pipeline to replace the manual workflow:
- Analysis — Mapped all Excel formats, identified validation rules, documented edge cases
- Schema Design — Designed a normalized database schema that could handle all import variations with proper constraints
- Processing Pipeline — Built a Node.js batch processor: file upload, parsing (xlsx), validation, transformation, staged insert, confirmation
- Error Handling — Each row validates independently so one bad row doesn't block the entire batch. Errors are collected and reported with row numbers and specific failure reasons
- Admin Dashboard — Built a simple UI for the ops team: drag-and-drop upload, progress tracking, error review, and one-click confirmation
Technical Decisions
- Streaming parser for large files (50K+ rows) to avoid memory issues
- Staged inserts with transaction rollback for data integrity
- Configurable validation rules so the ops team can adjust without code changes
- Background job queue for processing so the UI stays responsive
Code: Pipeline Architecture
The core processing pipeline — each row validates independently so one bad row never blocks the batch:
// Simplified batch processor pipeline
async function processBatch(file: Buffer, config: ImportConfig) {
const rows = parseExcel(file, { streaming: true })
const results: ProcessResult = { success: [], errors: [] }
for await (const chunk of batch(rows, 500)) {
const validated = chunk.map((row, i) => {
const errors = config.rules
.map(rule => rule.validate(row))
.filter(Boolean)
return errors.length
? { row, index: i, errors }
: { row, index: i, data: config.transform(row) }
})
const good = validated.filter(r => 'data' in r)
const bad = validated.filter(r => 'errors' in r)
if (good.length > 0) {
await db.transaction(async (tx) => {
await tx.insert(records).values(
good.map(r => r.data)
)
})
}
results.success.push(...good)
results.errors.push(...bad)
}
return results // { success: 49,847 rows, errors: 153 rows with details }
}Results
15 min
Average processing time (was 4-6 hours)
99.8%
Data accuracy (was ~95%)
Zero
Manual data entry
12hrs
Saved per week
Key Takeaways
- The hardest part wasn't the code — it was understanding all the edge cases in the Excel files
- Streaming processing is essential once files exceed 10K rows
- Showing row-level errors with context is worth the extra effort — it builds trust with the ops team
- A simple drag-and-drop UI removes all friction from adoption
Need to automate a manual data workflow?
Let's Talk