Performance Optimization

Fast integrations mean better user experience, lower costs, and more reliable data syncing. This guide covers best practices and techniques to optimize your Retrieve integrations for maximum performance.

⚡ Why Performance Matters

  • Faster Data Sync - Reduce integration execution time from minutes to seconds
  • Lower Costs - Less compute time = lower infrastructure costs
  • Better Reliability - Shorter jobs are less likely to timeout or fail
  • Higher Throughput - Process more data in the same time window
  • Reduced API Usage - Efficient queries save rate limits and quota

Understanding Integration Performance

Integration performance depends on several factors:

📊 Data Volume

How much data you're processing

Impact: High

🔌 API Performance

Speed of source/destination APIs

Impact: High

💻 Code Efficiency

How your rewrite functions are written

Impact: Medium

🌐 Network Latency

Distance and connection quality

Impact: Medium

🔄 Transformation Complexity

How much processing you're doing

Impact: Low-Medium

Query Optimization

The most impactful performance optimization is reducing the data you fetch and process.

1. Use Date Filters (Pull Only What Changed)

Instead of pulling all records every time, fetch only records modified since the last sync:

❌ Bad: Fetching Everything

// Fetches ALL orders every time
const orders = await api.get('/orders');
// Returns 10,000 orders (only 50 are new)

Result: 10,000 orders fetched every hour (9,999 duplicates)

✅ Good: Incremental Fetch

// Fetch only orders modified since last sync
const lastSyncTime = await getLastSyncTime();
const orders = await api.get('/orders', {
  params: {
    updated_at_min: lastSyncTime
  }
});
// Returns only 50 new orders

Result: 50 new orders fetched (99.5% reduction)

2. Limit Results with Pagination

Process large datasets in smaller batches instead of loading everything at once:

// Process large dataset in batches
let page = 1;
let allOrders = [];

while (true) {
  const response = await api.get('/orders', {
    params: {
      page: page,
      limit: 100 // Fetch 100 at a time
    }
  });
  
  if (response.orders.length === 0) break;
  
  allOrders = allOrders.concat(response.orders);
  page++;
}

return { jobStatus: 1, data: allOrders };

3. Select Only Required Fields

Many APIs allow you to specify which fields to return. Only fetch what you need:

❌ Bad: Fetching All Fields

// Returns entire product object with all fields
const products = await api.get('/products');

Returns 50+ fields including images, HTML descriptions, metadata

✅ Good: Select Specific Fields

// Request only needed fields
const products = await api.get('/products', {
  params: {
    fields: 'id,name,price,sku,inventory_quantity'
  }
});

Returns only 5 needed fields (90% smaller response)

Data Processing Optimization

1. Use Field Mapping Instead of Code

Field mapping is faster than rewrite functions for simple transformations:

⚠️ Slower: Rewrite Function

// Rewrite function for every record
let orders = job.data.data;

let transformed = orders.map(order => ({
  id: order.id,
  customer: order.customer.name,
  total: order.total_price,
  status: order.status
}));

return { jobStatus: 1, data: transformed };

Requires code execution for every record

✅ Faster: Field Mapping

{
  "id": "id",
  "customer": "customer.name",
  "total": "total_price",
  "status": "status"
}

Direct mapping, no code execution needed

2. Optimize Loops and Iterations

Efficient code makes a big difference when processing thousands of records:

❌ Bad: Nested Loops

let orders = job.data.data;
let customers = await getCustomers(); // 1000 customers

// O(n²) - nested loop
let enriched = orders.map(order => {
  let customer = customers.find(c => c.id === order.customer_id);
  return {
    ...order,
    customer_name: customer?.name
  };
});

O(n²) complexity - 10,000 orders = 100 million operations

✅ Good: Hash Map Lookup

let orders = job.data.data;
let customers = await getCustomers();

// Build hash map once - O(n)
let customerMap = {};
customers.forEach(c => {
  customerMap[c.id] = c;
});

// Single loop lookup - O(n)
let enriched = orders.map(order => ({
  ...order,
  customer_name: customerMap[order.customer_id]?.name
}));

O(n) complexity - 10,000 orders = 20,000 operations (5000x faster)

3. Avoid Unnecessary Transformations

Only transform data that actually needs transformation:

let products = job.data.data;

let processed = products.map(product => {
  let result = { ...product };
  
  // Only calculate if price changed
  if (product.price !== product.previous_price) {
    result.discount_percentage = calculateDiscount(product);
  }
  
  // Only enrich if category is missing
  if (!product.category_name) {
    result.category_name = lookupCategory(product.category_id);
  }
  
  return result;
});

4. Batch API Requests

If the destination API supports bulk operations, use them instead of individual requests:

❌ Bad: Individual API Calls

let orders = job.data.data;

// Bad: 1 API call per order
for (let order of orders) {
  await api.post('/destination/orders', order);
}
// 1000 orders = 1000 API calls

1000 orders = 1000 API calls (slow, rate limit issues)

✅ Good: Batch API Calls

let orders = job.data.data;

// Good: Batch 100 orders per call
for (let i = 0; i < orders.length; i += 100) {
  let batch = orders.slice(i, i + 100);
  await api.post('/destination/orders/batch', { orders: batch });
}
// 1000 orders = 10 API calls

1000 orders = 10 API calls (100x fewer requests)

Network & Connection Optimization

1. Use Private Networks for On-Premise Systems

Tailscale VPN connections are significantly faster than routing through public internet:

🌐 Public Internet

Multiple hops, variable latency

~100-300ms per request

🔒 Tailscale VPN

Direct peer-to-peer connection

~10-50ms per request (3-10x faster)

Learn more about Private Networks

2. Connection Pooling for Databases

Reuse database connections instead of creating new ones for each query:

// Connection pooling (handled by database client)
const mysql = require('mysql2/promise');

const pool = mysql.createPool({
  host: 'database.local',
  user: 'user',
  password: 'pass',
  database: 'mydb',
  waitForConnections: true,
  connectionLimit: 10, // Reuse up to 10 connections
  queueLimit: 0
});

const [rows] = await pool.query('SELECT * FROM orders');

3. Compression for Large Payloads

Enable compression when transferring large amounts of data:

// Enable compression for API requests
const response = await fetch('https://api.example.com/large-data', {
  headers: {
    'Accept-Encoding': 'gzip, deflate, br',
    'Content-Type': 'application/json'
  }
});

// Data is automatically decompressed

Caching Strategies

Cache Static or Rarely-Changed Data

Don't fetch data that doesn't change frequently:

// Cache static data that rarely changes
let cache = {
  categories: null,
  lastFetch: null
};

function getCategories() {
  // Cache for 24 hours
  const CACHE_DURATION = 24 * 60 * 60 * 1000;
  
  if (cache.categories && 
      cache.lastFetch && 
      (Date.now() - cache.lastFetch) < CACHE_DURATION) {
    console.log('Using cached categories');
    return cache.categories;
  }
  
  console.log('Fetching fresh categories');
  cache.categories = await api.get('/categories');
  cache.lastFetch = Date.now();
  
  return cache.categories;
}

// Use cached categories
let categories = getCategories();

Scheduling Optimization

1. Choose Appropriate Frequency

Not all integrations need to run every 5 minutes:

⚡ Real-Time (Webhooks)

Use for: Time-sensitive events (new orders, payments)

Performance: Best - only runs when needed

🔄 Every 5-15 Minutes

Use for: Frequent updates (inventory, order status)

Performance: Good if data volume is low

⏰ Hourly

Use for: Regular syncs (customer updates, product data)

Performance: Balanced frequency and efficiency

📅 Daily

Use for: Bulk operations (full catalog sync, reports)

Performance: Best for large datasets

2. Avoid Peak Hours

Schedule resource-intensive integrations during off-peak hours:

  • Good: Run full product catalog sync at 2 AM when traffic is low
  • Good: Schedule bulk customer imports on weekends
  • Bad: Running heavy integrations during business hours

3. Stagger Multiple Integrations

If you have multiple integrations, spread them out instead of running all at once:

❌ All at Once

Integration A: Every hour at :00

Integration B: Every hour at :00

Integration C: Every hour at :00

All run simultaneously → resource contention

✅ Staggered

Integration A: Every hour at :00

Integration B: Every hour at :20

Integration C: Every hour at :40

Spread load evenly → better performance

Monitoring & Identifying Bottlenecks

Key Performance Metrics

⏱️ Execution Time

Total time from start to finish

Target: <5 minutes for most integrations

📊 Records/Second

Processing throughput

Target: >100 records/second

🌐 API Response Time

How long external APIs take

Monitor for increases

💾 Memory Usage

RAM consumed during execution

Keep under 512MB

Using Console Logs for Debugging

Add timing logs to identify slow sections:

let startTime = Date.now();
console.log('Starting order fetch...');

let orders = await api.get('/orders');
console.log(`Fetched ${orders.length} orders in ${Date.now() - startTime}ms`);

let transformStart = Date.now();
let processed = orders.map(transformOrder);
console.log(`Transformed data in ${Date.now() - transformStart}ms`);

let pushStart = Date.now();
await api.post('/destination/orders', processed);
console.log(`Pushed to destination in ${Date.now() - pushStart}ms`);

console.log(`Total execution time: ${Date.now() - startTime}ms`);

Performance Checklist

Query Optimization

Code Optimization

Network Optimization

Scheduling Optimization

Real-World Performance Improvements

Case Study 1: Order Sync Optimization

Problem

Integration fetching 50,000 orders every hour, taking 45 minutes to complete

Solution

  • Added date filter: updated_at > last_sync_time
  • Reduced to ~200 orders per sync
  • Enabled batch API updates (100 orders per call)

Result

⚡ Execution time: 45 min → 2 min (95% reduction)

📊 API calls: 50,000 → 2 per sync

Case Study 2: Product Enrichment

Problem

Looking up category data for each product using nested loops, taking 30 minutes for 5,000 products

Solution

  • Built category hash map once at the start
  • Changed from O(n²) to O(n) complexity
  • Added caching for category data (changes rarely)

Result

⚡ Execution time: 30 min → 3 min (90% reduction)

💾 Memory usage: 2GB → 200MB

Case Study 3: Customer Import

Problem

Full customer import (10,000 customers) failing due to timeout after 1 hour

Solution

  • Implemented pagination: 500 customers per page
  • Changed from full import to incremental sync
  • Used field selection to fetch only needed fields

Result

⚡ Execution time: 60+ min → 8 min (87% reduction)

✅ No more timeouts, reliable execution

When to Optimize

Focus optimization efforts where they'll have the most impact:

🔴 High Priority - Optimize Now

  • Execution time >10 minutes
  • Frequent timeouts or failures
  • Processing same records repeatedly
  • Hitting API rate limits

🟡 Medium Priority - Optimize Soon

  • Execution time 5-10 minutes
  • Growing dataset causing slowdowns
  • Occasional performance issues
  • Approaching rate limit thresholds

🟢 Low Priority - Monitor

  • Execution time <5 minutes
  • Stable performance over time
  • Small to medium datasets
  • No user complaints

Next Steps