15. Data-Driven Decisions: From a Free Analytics Setup to Smart Recommendations

“
Going live is just a product's "birth"; data is the "food" for its growth. Without being fed data, a product stops evolving and eventually "starves to death" in the competition. Today, let's talk about how to build a sustainable "food supply system" for our application from scratch—an analytics system.

In the early stages of this project, I faced a classic dilemma: do I take the "easy path" and use an off-the-shelf solution like Google Analytics, or do I take the "hard path" and invest time in building my own analytics system?

Ultimately, I chose a "progressive" path—one that was more challenging but offered far greater long-term rewards. The thinking behind that decision, and the hands-on experience we gained building it from the ground up, is the core of what I want to share with you in this chapter.

Why a "Progressive" Analytics Strategy?

This decision stemmed from a trade-off between three core issues.

Trade-off #1: Data Ownership

As a content platform, user behavior data is our most valuable strategic asset. Using a third-party tool means the control and interpretation of that data are not entirely in our hands. Especially when we plan to build our own recommendation systems and AI models on top of this data, having complete, raw, and freely accessible data becomes a non-negotiable prerequisite.

Trade-off #2: Business Customization

Generic tools like GA can tell you "which page has the highest traffic." But they struggle to answer the more specific, business-critical questions we care about:

Which photo collections do users spend the most time in?
What is the conversion rate for users navigating from which article to which collection?
On average, after viewing the how-manieth photo does a user decide to like or comment?

To answer these questions, we must be able to define a custom event model and track the user behaviors that are truly valuable to our business.

Trade-off #3: Cost and Technical Acumen

For an indie developer, every dollar counts. Building our own system avoids a potentially hefty subscription fee in the early days. More importantly, the process of building it ourselves gives us a "pixel-perfect" understanding of the entire data flow, which is an invaluable technical foundation for building more complex intelligent applications in the future.

Our "Progressive" Strategy Blueprint

The core of this strategy is: at different stages, smartly leverage free platform capabilities while focusing on building our own core data assets.

Phase 1 (0-10k Users): Build a lightweight, custom system. We'll reuse Next.js API Routes, Prisma, and Vercel Postgres to build our own data collection pipeline at zero cost. The priority in this phase is designing a future-proof data schema.
Phase 2 (10k-100k Users): Introduce professional tools as a supplement. For example, using Vercel Analytics for performance monitoring and Core Web Vitals, but continuing to collect our core business data with our own system.
Phase 3 (100k+ Users): Monetizing the data's value. Based on the data we've accumulated, we can start building smart recommendation features and user profiles, allowing our data asset to truly generate compound interest.

In Practice: Key Designs of a Self-Built Analytics System

Now, let's get our hands dirty and achieve our Phase 1 goal.

The Data Schema: A Data Warehouse Blueprint Designed for the Future

A good schema design must have strong foresight.

-- Our AnalyticsEvent table structure
CREATE TABLE "analytics_events" (
  "id" VARCHAR PRIMARY KEY,
  "event_name" VARCHAR NOT NULL,        -- Event name, e.g., 'photo_view'
  "timestamp" TIMESTAMP NOT NULL,       -- Client-side timestamp of the event
  "date" DATE NOT NULL,                 -- Partition key for efficient daily queries and archiving
  "session_id" VARCHAR NOT NULL,        -- Session ID to track a single user visit
  "user_id" VARCHAR,                    -- The ID of the logged-in user
  "page" VARCHAR NOT NULL,              -- The page where the event occurred
  "referrer" VARCHAR,                   -- The referring page
  "properties" JSONB,                   -- The core! A flexible JSON field for custom event properties
  "performance" JSONB,                  -- For storing Core Web Vitals and other performance data
  "server_timestamp" BIGINT,            -- The timestamp when the server received the event
  "environment" VARCHAR DEFAULT 'production'
);

sql

The highlights of this design include:

The JSONB Field: This gives us infinite flexibility. If we want to add a new tracking property to an event in the future (like a photo's color_temperature), we don't need to alter the database schema. We just add a new field to the properties JSON.
The date Partition Key: When our data reaches tens or hundreds of millions of rows, partitioning queries by date can increase query speed by orders of magnitude.
Dual Timestamps: Client-side time can be inaccurate. Having a server-side timestamp gives us an absolutely reliable time benchmark for our analysis.

Asynchronous Processing: Never Let Analytics Hurt the User Experience

Analytics data is important, but it should never block a user's core actions. Our strategy is: batch and send asynchronously on the client, write asynchronously on the server. It's like mailing a letter, not making an emergency phone call.

“
Client-Side: Collect a few letters before heading to the postbox

// Core concept of the client-side AnalyticsLogger
class AnalyticsLogger {
  private eventQueue: AnalyticsEvent[] = []; // This is our "mailbox"
  private flushInterval = 5000; // Go to the "post office" every 5 seconds

  trackEvent(eventName, properties) {
    this.eventQueue.push({ eventName, properties, ... });

    // If we have 10 letters, or it's been 5 seconds, mail them
    if (this.eventQueue.length >= 10) {
      this.flush();
    }
  }

  private async flush() {
    const eventsToSend = [...this.eventQueue];
    this.eventQueue = []; // Empty the mailbox

    // Send asynchronously without blocking user actions
    // `keepalive: true` ensures the request will try its best to complete, even if the user closes the page
    fetch('/api/analytics', {
      method: 'POST',
      body: JSON.stringify({ events: eventsToSend }),
      keepalive: true
    }).catch(err => {
      // If sending fails, put the unsent letters back at the front of the queue
      this.eventQueue.unshift(...eventsToSend);
    });
  }
}

typescript

“
Server-Side: Reply as soon as you get the mail; read it later

// The server-side API: /api/analytics
export async function POST(request: NextRequest) {
  try {
    const { events } = await request.json()

    // Validate and cleanse the data
    const validEvents = events.map((event) => ({
      /* ...add IP, User-Agent, etc. ... */
    }))

    // Key: kick off the time-consuming database write in the background, without awaiting it
    prisma.analyticsEvent
      .createMany({
        data: validEvents,
        skipDuplicates: true,
      })
      .catch((err) => {
        // If the write fails, log the error but don't disrupt the main flow
        console.error('Analytics write to DB failed:', err)
      })

    // Immediately tell the client, "Mail received, you can carry on."
    return NextResponse.json({ success: true })
  } catch (error) {
    return NextResponse.json({ error: 'Invalid request' }, { status: 400 })
  }
}

typescript

This asynchronous processing system perfectly achieves performance decoupling between our analytics system and our main business logic.

The Smart Recommendation System: From Data to Value

Theory is always dry, so let's focus on a highly specific business scenario: dynamic photo sorting on the Collection detail page.

Currently, the photos on the detail page are in a fixed order. But with our analytics data, we can achieve personalized, "a thousand faces for a thousand people" sorting, showing each user the photos they are most likely to love.

What are our "Raw Ingredients"?

Thanks to our forward-thinking planning, we now have a rich set of "raw data ingredients" on hand:

A user's dwell time and scroll depth on different photos.
Records of a user's likes, comments, and shares.
The navigation paths a user takes between different Collections.

// From our analytics data, we can distill a user preference profile like this
function analyzeUserPreferences(userId: string) {
  const userEvents = await prisma.analyticsEvent.findMany({
    where: {
      userId,
      eventName: { in: ['photo_view', 'photo_like', 'photo_comment'] },
    },
    // ...
  })

  return {
    viewedPhotos: extractViewedPhotos(userEvents),
    likedPhotos: extractLikedPhotos(userEvents),
    avgViewTime: calculateAverageViewTime(userEvents),
    preferredStyles: inferStylePreferences(userEvents), // e.g., prefers "black and white" or "street" styles
  }
}

typescript

The Recommendation Algorithm: A Lightweight Yet Effective Recipe

In the early stages, we don't need a complex algorithm like Netflix's. We can design a lightweight, three-step recommendation strategy.

Step 1: Multi-Channel Recall - "The Audition"
The goal of "recall" is to quickly, without worrying too much about precision, "audition" a few hundred candidate photos from a pool of thousands. We use multiple strategies to ensure the results are both relevant and diverse.

# Pseudocode: A multi-channel recall strategy
def multi_recall_strategy(user_id, collection_id):
    candidates = set()

    # Strategy 1: Collaborative filtering - find photos liked by users with similar tastes
    similar_users = find_similar_users(user_id)
    for user in similar_users:
        candidates.update(get_user_liked_photos(user, collection_id))

    # Strategy 2: Content-based similarity - find photos with styles similar to what the user has liked before
    user_preferences = get_user_style_preferences(user_id)
    candidates.update(find_photos_with_similar_style(user_preferences))

    # Strategy 3: Popularity recall - add in the hottest photos from the last week
    candidates.update(get_popular_photos(collection_id, time_window='7d'))

    # Strategy 4: Exploration recall - add some random new photos to prevent filter bubbles
    candidates.update(get_recent_photos(collection_id, limit=10))

    return list(candidates)

python

Step 2: Ranking - "The Judge's Selection"
The goal of "ranking" is to take the hundreds of auditioned photos and give each one a precise, personalized score. For performance and cost reasons, we'll start with a lightweight dual-tower model concept.

User Tower: Generates a vector representing the user's interests based on their historical behavior (e.g., [0.8, 0.2, 0.9] representing preferences for "black & white," "landscape," and "portrait").
Item Tower: Generates a vector representing the photo's style based on its features (tags, colors, composition).

# Pseudocode: A lightweight dual-tower ranking model
def lightweight_ranking_model(user_embedding, photo_embeddings):
    scores = []
    for photo_emb in photo_embeddings:
        # By calculating the "cosine similarity" between the user vector and each photo vector,
        # we can get a score representing the user's likely preference for that photo.
        similarity_score = cosine_similarity(user_embedding, photo_emb)
        scores.append(similarity_score)

    return scores

python

Step 3: The Policy Layer - "The Director's Cut"
The ranking score is purely model-based, but real-world business scenarios require some "human intervention."

# Pseudocode: Applying business rule interventions
def apply_business_policies(ranked_photos, user_context):
    final_list = []
    for photo, score in ranked_photos:
        # New user cold start: boost popular content for new users
        if user_context.get('is_new_user'):
            if photo.is_popular:
                score *= 1.2

        # Diversity strategy: prevent photos from the same photographer from dominating the results
        if count_photographer(final_list, photo.photographer) > 3:
            score *= 0.8

        # Deboosting: reduce the score of photos with similar styles appearing too close together
        # ...

        final_list.append((photo, score))

    # Return the list after re-sorting
    return sorted(final_list, key=lambda x: x[1], reverse=True)

python

With these "three steps," we can achieve a surprisingly effective personalized recommendation system at a manageable cost.

# The complete recommendation pipeline
def recommend_photos_for_collection(user_id: str, collection_id: str, limit: int = 20):
    """
    Generate a personalized photo sort order for a collection detail page.
    """
    # 1. Get candidate photos
    candidate_photos = get_collection_photos(collection_id)
    # 2. Multi-channel recall
    recall_results = multi_recall_strategy(user_id, collection_id, candidate_photos)
    # 3. Merge recall results
    merged_candidates = merge_recall_results(recall_results)
    # 4. Generate user and photo embeddings
    user_embedding = generate_user_embedding(user_id)
    photo_embeddings = [generate_photo_embedding(photo) for photo in merged_candidates]
    # 5. Ranking
    ranking_scores = lightweight_ranking_model(user_embedding, photo_embeddings)
    # 6. Apply business policies
    user_context = get_user_context(user_id)
    final_recommendations = apply_business_policies(
        list(zip(merged_candidates, ranking_scores)),
        user_context
    )

    return final_recommendations[:limit]

python

More Scenarios Driven by Analytics

This is just the beginning. With our data, we can do so much more:

Smart Homepage Sorting: Dynamically adjust the order of content on the homepage based on user preferences.
Comment Sentiment Analysis: Identify product pain points and highlights by analyzing the emotional tone of user comments.
User Churn Prediction: Proactively identify users at risk of churning based on changes in their behavior patterns (e.g., decreased visit frequency) and take measures to retain them.

Lessons Learned & Future Outlook

Data quality is always more important than data quantity. Ensure every event you collect is accurate, clean, and meaningful.
Progressive iteration is king. Don't try to build the perfect system from day one. Start with simple logging, gradually add recall strategies, and only then consider a ranking model.
Be business-value-oriented. Every analytics feature and recommendation function should be tied to a clear business goal (e.g., increase user dwell time, improve the like rate).
User experience comes first. Data collection and recommendation calculations must not impact the core user experience. Asynchronous processing is your best friend.

As the project grows, we plan to go deeper in these areas:

Multimodal Understanding: Combine computer vision (CV) and natural language processing (NLP) to gain a deeper understanding of photo and comment content.
Real-time Personalization: Dynamically adjust recommendation strategies based on a user's real-time behavior stream.
Data Privacy and Compliance: Provide users with more transparent explanations for recommendations and more autonomous control over their data.

In Conclusion

Analytics isn't just about data reports; it's the foundation of product intelligence. Through this hands-on practice, I've come to deeply appreciate that the most valuable analytics system is the one that can continuously convert "data" into "user value."

From a free, self-built solution to a complex smart recommendation system, this path isn't easy, but every step we take is accumulating the most precious assets for our product's future.

Remember: data is the new oil, and analytics is the refinery you build with your own hands.

Coming Up Next: In the next chapter, "Going Live: Pushing the 'Launch' Button," we will complete the final step of our product's journey—the official launch. We'll discuss production environment configuration, setting up monitoring and alerts, and how to establish a sustainable operations system. From development to launch, from features to operations, let's get your product truly out into the market.

15. Data-Driven Decisions: From a Free Analytics Setup to Smart Recommendations

Why a "Progressive" Analytics Strategy?

Trade-off #1: Data Ownership

Trade-off #2: Business Customization

Trade-off #3: Cost and Technical Acumen

Our "Progressive" Strategy Blueprint

In Practice: Key Designs of a Self-Built Analytics System

The Data Schema: A Data Warehouse Blueprint Designed for the Future

Asynchronous Processing: Never Let Analytics Hurt the User Experience

The Smart Recommendation System: From Data to Value

What are our "Raw Ingredients"?

The Recommendation Algorithm: A Lightweight Yet Effective Recipe

More Scenarios Driven by Analytics

Lessons Learned & Future Outlook

In Conclusion

Content Copyright Notice