Change the way the updater is working with the app

The DocsUpdater, which gets the latest Feide Docs, is currently running as part of the web service. That is fine but not ideal for production. Here is a suggested solution from Claude-3.7-thinking which suggests using a shared volume for running the updater such that it won't interfere with the web service. Exactly how this will work with Platon is unknown.

Claude Suggestion:

The user is asking for a more production-ready solution for updating documentation in their Kubernetes environment, as they're experiencing issues with the current implementation where the Uvicorn server restarts during document updates.

Let's analyze what a more production-ready solution would look like for this use case. There are several approaches:

Separate Kubernetes CronJob: Create a dedicated Kubernetes CronJob that runs independently of the main application to handle document updates.
Sidecar Container: Use a sidecar container in the same pod that handles document updates and shares the updated documents with the main container.
External Job System: Use a system like Apache Airflow, AWS Batch, or similar to handle the document update process outside of Kubernetes.
Message Queue-Based Approach: Implement a message queue system (RabbitMQ, Kafka, etc.) where a producer schedules updates and a consumer processes them.

The Kubernetes CronJob seems to be the most appropriate and straightforward solution for this use case. It would:

Run on a schedule
Handle document updates independently from the web service
Share data through a persistent volume claim
Not cause service restarts

I'll create a comprehensive solution that includes:

A Kubernetes CronJob manifest
Necessary volume configurations
Modified app code to correctly handle the updated data
Explanation of how this approach improves reliability

A More Production-Ready Solution for Document Updates

Your current implementation has the document updating logic embedded directly in your web service, which causes issues like unwanted restarts when running with hot-reload. For a more production-ready solution, it's better to separate concerns and follow the single-responsibility principle.

Here's a robust solution using Kubernetes CronJobs:

1. Create a Separate CronJob for Document Updates

apiVersion: batch/v1
kind: CronJob
metadata:
  name: $CI_ENVIRONMENT_SLUG-docs-updater
  namespace: $KUBE_NAMESPACE
  labels:
    app: $CI_ENVIRONMENT_SLUG-docs-updater
spec:
  schedule: "0 1 * * *"  # Run at 1:00 AM every day
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 3
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: docs-updater
            image: $CI_REGISTRY_IMAGE:$CI_REGISTRY_TAG
            command: ["python", "-m", "feide_docs_rag.jobs.update_docs"]
            volumeMounts:
            - name: shared-docs
              mountPath: /app/feide_docs_rag/local_docs
            - name: shared-vectorstore
              mountPath: /app/vectorstore-data
            env:
            - name: RUNNING_ENVIRONMENT
              value: $RUNNING_ENVIRONMENT
            - name: ACCESS_TOKEN_DOCS_GITLAB
              valueFrom:
                secretKeyRef:
                  name: $CI_ENVIRONMENT_SLUG-app-secrets
                  key: ACCESS_TOKEN_DOCS_GITLAB
            - name: AZURE_OPENAI_API_KEY
              valueFrom:
                secretKeyRef:
                  name: $CI_ENVIRONMENT_SLUG-app-secrets
                  key: AZURE_OPENAI_API_KEY
            - name: AZURE_OPENAI_ENDPOINT
              valueFrom:
                secretKeyRef:
                  name: $CI_ENVIRONMENT_SLUG-app-secrets
                  key: AZURE_OPENAI_ENDPOINT
            - name: AZURE_OPENAI_API_VERSION
              valueFrom:
                secretKeyRef:
                  name: $CI_ENVIRONMENT_SLUG-app-secrets
                  key: AZURE_OPENAI_API_VERSION
            - name: AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME
              valueFrom:
                secretKeyRef:
                  name: $CI_ENVIRONMENT_SLUG-app-secrets
                  key: AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME
            - name: DB_HOST
              valueFrom:
                secretKeyRef:
                  name: $CI_ENVIRONMENT_SLUG-app-secrets
                  key: DB_HOST
            - name: DB_NAME
              valueFrom:
                secretKeyRef:
                  name: $CI_ENVIRONMENT_SLUG-app-secrets
                  key: DB_NAME
            - name: DB_USER
              valueFrom:
                secretKeyRef:
                  name: $CI_ENVIRONMENT_SLUG-app-secrets
                  key: DB_USER
            - name: DB_PASS
              valueFrom:
                secretKeyRef:
                  name: $CI_ENVIRONMENT_SLUG-app-secrets
                  key: DB_PASS
          volumes:
          - name: shared-docs
            persistentVolumeClaim:
              claimName: $CI_ENVIRONMENT_SLUG-docs-pvc
          - name: shared-vectorstore
            persistentVolumeClaim:
              claimName: $CI_ENVIRONMENT_SLUG-vectorstore-pvc
          restartPolicy: OnFailure

2. Create Persistent Volume Claims

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: $CI_ENVIRONMENT_SLUG-docs-pvc
  namespace: $KUBE_NAMESPACE
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: $CI_ENVIRONMENT_SLUG-vectorstore-pvc
  namespace: $KUBE_NAMESPACE
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 5Gi

3. Update Deployment to Use Shared Volumes

// ... existing code ...
spec:
  containers:
    - name: app
      // ... existing code ...
      volumeMounts:
      - name: shared-docs
        mountPath: /app/feide_docs_rag/local_docs
      - name: shared-vectorstore
        mountPath: /app/vectorstore-data
      // ... existing code ...
  volumes:
  - name: shared-docs
    persistentVolumeClaim:
      claimName: $CI_ENVIRONMENT_SLUG-docs-pvc
  - name: shared-vectorstore
    persistentVolumeClaim:
      claimName: $CI_ENVIRONMENT_SLUG-vectorstore-pvc
// ... existing code ...

4. Create a Standalone Update Script

#!/usr/bin/env python
"""
Standalone script to update documentation and vector store.
This is meant to be run as a cron job separate from the web server.
"""

import os
import sys
import logging
import asyncio
from datetime import datetime
import pytz
from sqlalchemy.exc import OperationalError
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type

# Add project root to path if running as script
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), "../..")))

from feide_docs_rag.database import db_connection_string
from rag_utils import VectorStore
from langchain_openai import AzureOpenAIEmbeddings
from feide_docs_rag.env_config import collection_name
from feide_docs_rag.configuration import FeideConfiguration
from feide_docs_rag.data_loader.loader import load_docs_from_web
from feide_docs_rag.data_loader.docs_updater import DocsUpdater

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
    handlers=[logging.StreamHandler()]
)
logger = logging.getLogger(__name__)

if sys.platform == "win32":
    asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

@retry(
    retry=retry_if_exception_type(OperationalError),
    stop=stop_after_attempt(5),
    wait=wait_exponential(multiplier=1, min=2, max=20),
    reraise=True,
)
async def update_vectorstore():
    """Update the vector store with fresh data from the web."""
    try:
        logger.info(f"Starting vector store update at {datetime.now(pytz.timezone('CET'))}")
        
        # Initialize the vector store
        embeddings = AzureOpenAIEmbeddings(
            api_version=os.environ["AZURE_OPENAI_API_VERSION"],
            azure_deployment=os.environ["AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME"],
        )
        
        vectorstore = VectorStore(
            collection_name,
            db_connection_string,
            embeddings=embeddings,
            retriever_search_type="similarity_score_threshold",
            retriever_kwargs={"score_threshold": 0.4, "k": 8},
        )
        
        # Update documentation from GitLab
        docs_updater = DocsUpdater(
            gitlab_url="https://gitlab.sikt.no",
            gitlab_token=os.getenv("ACCESS_TOKEN_DOCS_GITLAB"),
            gitlab_project_id=1141,
            document_path="feide_docs_rag/local_docs",
        )
        updated = docs_updater.update_docs()
        
        if updated:
            logger.info("Documentation updated from GitLab, updating vector store")
            # Process documents and update vector store
            docs = FeideConfiguration(
                base_url="https://docs.feide.no",
                document_path="feide_docs_rag/local_docs",
                web_loader_function=load_docs_from_web,
                random_state=42,
            ).get_docs()
            
            await vectorstore.setup_vectorstore(docs)
            logger.info(f"Vector store update completed at {datetime.now(pytz.timezone('CET'))}")
        else:
            logger.info("No documentation updates from GitLab, skipping vector store update")
            
    except Exception as e:
        logger.error(f"Error updating vector store: {str(e)}", exc_info=True)
        raise

async def main():
    """Main entry point for the script."""
    try:
        await update_vectorstore()
    except Exception as e:
        logger.error(f"Update process failed: {str(e)}", exc_info=True)
        sys.exit(1)

if __name__ == "__main__":
    asyncio.run(main())

5. Update Your App Lifecycle to Remove Scheduled Updates

from contextlib import asynccontextmanager
from feide_docs_rag.feide_chain import chain_service, vectorstore
from rag_utils import DefaultRagApp  # type: ignore

@asynccontextmanager
async def lifespan(app: DefaultRagApp):
    # No scheduling needed here anymore
    yield
    # No cleanup needed

app = DefaultRagApp(chain_service, lifespan=lifespan)

6. Update Your CI/CD Pipeline

// ... existing code ...

production:
  extends:
    - .production
    - .vault-secrets
  stage: production
  variables:
    RUNNING_ENVIRONMENT: "production"
    REPLICAS: "2"
  script:
    - deploy deployment.yaml
    - deploy volumes.yaml
    - deploy cronjob.yaml

// ... existing code for staging and review environments ...

Benefits of This Approach

Separation of Concerns: The web service focuses on serving requests, while the document update logic runs independently.
Reliability: No service disruptions or restarts when documentation is updated.
Scalability: Each component can be scaled independently.
Resource Efficiency: Document updates run on a schedule without consuming resources continuously.
Failure Isolation: If the document update process fails, it doesn't affect the web service.
Better Operability: Separate logs and monitoring for each concern make troubleshooting easier.

This solution leverages Kubernetes native capabilities to create a more reliable and maintainable system that follows best practices for production deployments.