Performance Optimization
Fast integrations mean better user experience, lower costs, and more reliable data syncing. This guide covers best practices and techniques to optimize your Retrieve integrations for maximum performance.
⚡ Why Performance Matters
- Faster Data Sync - Reduce integration execution time from minutes to seconds
- Lower Costs - Less compute time = lower infrastructure costs
- Better Reliability - Shorter jobs are less likely to timeout or fail
- Higher Throughput - Process more data in the same time window
- Reduced API Usage - Efficient queries save rate limits and quota
Understanding Integration Performance
Integration performance depends on several factors:
📊 Data Volume
How much data you're processing
Impact: High
🔌 API Performance
Speed of source/destination APIs
Impact: High
💻 Code Efficiency
How your rewrite functions are written
Impact: Medium
🌐 Network Latency
Distance and connection quality
Impact: Medium
🔄 Transformation Complexity
How much processing you're doing
Impact: Low-Medium
Query Optimization
The most impactful performance optimization is reducing the data you fetch and process.
1. Use Date Filters (Pull Only What Changed)
Instead of pulling all records every time, fetch only records modified since the last sync:
❌ Bad: Fetching Everything
// Fetches ALL orders every time
const orders = await api.get('/orders');
// Returns 10,000 orders (only 50 are new)Result: 10,000 orders fetched every hour (9,999 duplicates)
✅ Good: Incremental Fetch
// Fetch only orders modified since last sync
const lastSyncTime = await getLastSyncTime();
const orders = await api.get('/orders', {
params: {
updated_at_min: lastSyncTime
}
});
// Returns only 50 new ordersResult: 50 new orders fetched (99.5% reduction)
2. Limit Results with Pagination
Process large datasets in smaller batches instead of loading everything at once:
// Process large dataset in batches
let page = 1;
let allOrders = [];
while (true) {
const response = await api.get('/orders', {
params: {
page: page,
limit: 100 // Fetch 100 at a time
}
});
if (response.orders.length === 0) break;
allOrders = allOrders.concat(response.orders);
page++;
}
return { jobStatus: 1, data: allOrders };3. Select Only Required Fields
Many APIs allow you to specify which fields to return. Only fetch what you need:
❌ Bad: Fetching All Fields
// Returns entire product object with all fields
const products = await api.get('/products');Returns 50+ fields including images, HTML descriptions, metadata
✅ Good: Select Specific Fields
// Request only needed fields
const products = await api.get('/products', {
params: {
fields: 'id,name,price,sku,inventory_quantity'
}
});Returns only 5 needed fields (90% smaller response)
Data Processing Optimization
1. Use Field Mapping Instead of Code
Field mapping is faster than rewrite functions for simple transformations:
⚠️ Slower: Rewrite Function
// Rewrite function for every record
let orders = job.data.data;
let transformed = orders.map(order => ({
id: order.id,
customer: order.customer.name,
total: order.total_price,
status: order.status
}));
return { jobStatus: 1, data: transformed };Requires code execution for every record
✅ Faster: Field Mapping
{
"id": "id",
"customer": "customer.name",
"total": "total_price",
"status": "status"
}Direct mapping, no code execution needed
2. Optimize Loops and Iterations
Efficient code makes a big difference when processing thousands of records:
❌ Bad: Nested Loops
let orders = job.data.data;
let customers = await getCustomers(); // 1000 customers
// O(n²) - nested loop
let enriched = orders.map(order => {
let customer = customers.find(c => c.id === order.customer_id);
return {
...order,
customer_name: customer?.name
};
});O(n²) complexity - 10,000 orders = 100 million operations
✅ Good: Hash Map Lookup
let orders = job.data.data;
let customers = await getCustomers();
// Build hash map once - O(n)
let customerMap = {};
customers.forEach(c => {
customerMap[c.id] = c;
});
// Single loop lookup - O(n)
let enriched = orders.map(order => ({
...order,
customer_name: customerMap[order.customer_id]?.name
}));O(n) complexity - 10,000 orders = 20,000 operations (5000x faster)
3. Avoid Unnecessary Transformations
Only transform data that actually needs transformation:
let products = job.data.data;
let processed = products.map(product => {
let result = { ...product };
// Only calculate if price changed
if (product.price !== product.previous_price) {
result.discount_percentage = calculateDiscount(product);
}
// Only enrich if category is missing
if (!product.category_name) {
result.category_name = lookupCategory(product.category_id);
}
return result;
});4. Batch API Requests
If the destination API supports bulk operations, use them instead of individual requests:
❌ Bad: Individual API Calls
let orders = job.data.data;
// Bad: 1 API call per order
for (let order of orders) {
await api.post('/destination/orders', order);
}
// 1000 orders = 1000 API calls1000 orders = 1000 API calls (slow, rate limit issues)
✅ Good: Batch API Calls
let orders = job.data.data;
// Good: Batch 100 orders per call
for (let i = 0; i < orders.length; i += 100) {
let batch = orders.slice(i, i + 100);
await api.post('/destination/orders/batch', { orders: batch });
}
// 1000 orders = 10 API calls1000 orders = 10 API calls (100x fewer requests)
Network & Connection Optimization
1. Use Private Networks for On-Premise Systems
Tailscale VPN connections are significantly faster than routing through public internet:
🌐 Public Internet
Multiple hops, variable latency
~100-300ms per request
🔒 Tailscale VPN
Direct peer-to-peer connection
~10-50ms per request (3-10x faster)
Learn more about Private Networks
2. Connection Pooling for Databases
Reuse database connections instead of creating new ones for each query:
// Connection pooling (handled by database client)
const mysql = require('mysql2/promise');
const pool = mysql.createPool({
host: 'database.local',
user: 'user',
password: 'pass',
database: 'mydb',
waitForConnections: true,
connectionLimit: 10, // Reuse up to 10 connections
queueLimit: 0
});
const [rows] = await pool.query('SELECT * FROM orders');3. Compression for Large Payloads
Enable compression when transferring large amounts of data:
// Enable compression for API requests
const response = await fetch('https://api.example.com/large-data', {
headers: {
'Accept-Encoding': 'gzip, deflate, br',
'Content-Type': 'application/json'
}
});
// Data is automatically decompressedCaching Strategies
Cache Static or Rarely-Changed Data
Don't fetch data that doesn't change frequently:
// Cache static data that rarely changes
let cache = {
categories: null,
lastFetch: null
};
function getCategories() {
// Cache for 24 hours
const CACHE_DURATION = 24 * 60 * 60 * 1000;
if (cache.categories &&
cache.lastFetch &&
(Date.now() - cache.lastFetch) < CACHE_DURATION) {
console.log('Using cached categories');
return cache.categories;
}
console.log('Fetching fresh categories');
cache.categories = await api.get('/categories');
cache.lastFetch = Date.now();
return cache.categories;
}
// Use cached categories
let categories = getCategories();Scheduling Optimization
1. Choose Appropriate Frequency
Not all integrations need to run every 5 minutes:
⚡ Real-Time (Webhooks)
Use for: Time-sensitive events (new orders, payments)
Performance: Best - only runs when needed
🔄 Every 5-15 Minutes
Use for: Frequent updates (inventory, order status)
Performance: Good if data volume is low
⏰ Hourly
Use for: Regular syncs (customer updates, product data)
Performance: Balanced frequency and efficiency
📅 Daily
Use for: Bulk operations (full catalog sync, reports)
Performance: Best for large datasets
2. Avoid Peak Hours
Schedule resource-intensive integrations during off-peak hours:
- ✅ Good: Run full product catalog sync at 2 AM when traffic is low
- ✅ Good: Schedule bulk customer imports on weekends
- ❌ Bad: Running heavy integrations during business hours
3. Stagger Multiple Integrations
If you have multiple integrations, spread them out instead of running all at once:
❌ All at Once
Integration A: Every hour at :00
Integration B: Every hour at :00
Integration C: Every hour at :00
All run simultaneously → resource contention
✅ Staggered
Integration A: Every hour at :00
Integration B: Every hour at :20
Integration C: Every hour at :40
Spread load evenly → better performance
Monitoring & Identifying Bottlenecks
Key Performance Metrics
⏱️ Execution Time
Total time from start to finish
Target: <5 minutes for most integrations
📊 Records/Second
Processing throughput
Target: >100 records/second
🌐 API Response Time
How long external APIs take
Monitor for increases
💾 Memory Usage
RAM consumed during execution
Keep under 512MB
Using Console Logs for Debugging
Add timing logs to identify slow sections:
let startTime = Date.now();
console.log('Starting order fetch...');
let orders = await api.get('/orders');
console.log(`Fetched ${orders.length} orders in ${Date.now() - startTime}ms`);
let transformStart = Date.now();
let processed = orders.map(transformOrder);
console.log(`Transformed data in ${Date.now() - transformStart}ms`);
let pushStart = Date.now();
await api.post('/destination/orders', processed);
console.log(`Pushed to destination in ${Date.now() - pushStart}ms`);
console.log(`Total execution time: ${Date.now() - startTime}ms`);Performance Checklist
Query Optimization
Code Optimization
Network Optimization
Scheduling Optimization
Real-World Performance Improvements
Case Study 1: Order Sync Optimization
Problem
Integration fetching 50,000 orders every hour, taking 45 minutes to complete
Solution
- Added date filter:
updated_at > last_sync_time - Reduced to ~200 orders per sync
- Enabled batch API updates (100 orders per call)
Result
⚡ Execution time: 45 min → 2 min (95% reduction)
📊 API calls: 50,000 → 2 per sync
Case Study 2: Product Enrichment
Problem
Looking up category data for each product using nested loops, taking 30 minutes for 5,000 products
Solution
- Built category hash map once at the start
- Changed from O(n²) to O(n) complexity
- Added caching for category data (changes rarely)
Result
⚡ Execution time: 30 min → 3 min (90% reduction)
💾 Memory usage: 2GB → 200MB
Case Study 3: Customer Import
Problem
Full customer import (10,000 customers) failing due to timeout after 1 hour
Solution
- Implemented pagination: 500 customers per page
- Changed from full import to incremental sync
- Used field selection to fetch only needed fields
Result
⚡ Execution time: 60+ min → 8 min (87% reduction)
✅ No more timeouts, reliable execution
When to Optimize
Focus optimization efforts where they'll have the most impact:
🔴 High Priority - Optimize Now
- Execution time >10 minutes
- Frequent timeouts or failures
- Processing same records repeatedly
- Hitting API rate limits
🟡 Medium Priority - Optimize Soon
- Execution time 5-10 minutes
- Growing dataset causing slowdowns
- Occasional performance issues
- Approaching rate limit thresholds
🟢 Low Priority - Monitor
- Execution time <5 minutes
- Stable performance over time
- Small to medium datasets
- No user complaints
Next Steps
- Error Handling - Implement retry logic and notifications for failed jobs
- Field Mapping - Use efficient field mapping instead of code
- Rewrite Functions - Write performant transformation code
- Webhooks - Replace scheduled polls with real-time webhooks