Change the way the updater is working with the app
The DocsUpdater, which gets the latest Feide Docs, is currently running as part of the web service. That is fine but not ideal for production. Here is a suggested solution from Claude-3.7-thinking which suggests using a shared volume for running the updater such that it won't interfere with the web service. Exactly how this will work with Platon is unknown.
Claude Suggestion:
The user is asking for a more production-ready solution for updating documentation in their Kubernetes environment, as they're experiencing issues with the current implementation where the Uvicorn server restarts during document updates.Let's analyze what a more production-ready solution would look like for this use case. There are several approaches:
-
Separate Kubernetes CronJob: Create a dedicated Kubernetes CronJob that runs independently of the main application to handle document updates.
-
Sidecar Container: Use a sidecar container in the same pod that handles document updates and shares the updated documents with the main container.
-
External Job System: Use a system like Apache Airflow, AWS Batch, or similar to handle the document update process outside of Kubernetes.
-
Message Queue-Based Approach: Implement a message queue system (RabbitMQ, Kafka, etc.) where a producer schedules updates and a consumer processes them.
The Kubernetes CronJob seems to be the most appropriate and straightforward solution for this use case. It would:
- Run on a schedule
- Handle document updates independently from the web service
- Share data through a persistent volume claim
- Not cause service restarts
I'll create a comprehensive solution that includes:
- A Kubernetes CronJob manifest
- Necessary volume configurations
- Modified app code to correctly handle the updated data
- Explanation of how this approach improves reliability
A More Production-Ready Solution for Document Updates
Your current implementation has the document updating logic embedded directly in your web service, which causes issues like unwanted restarts when running with hot-reload. For a more production-ready solution, it's better to separate concerns and follow the single-responsibility principle.
Here's a robust solution using Kubernetes CronJobs:
1. Create a Separate CronJob for Document Updates
apiVersion: batch/v1
kind: CronJob
metadata:
name: $CI_ENVIRONMENT_SLUG-docs-updater
namespace: $KUBE_NAMESPACE
labels:
app: $CI_ENVIRONMENT_SLUG-docs-updater
spec:
schedule: "0 1 * * *" # Run at 1:00 AM every day
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
jobTemplate:
spec:
template:
spec:
containers:
- name: docs-updater
image: $CI_REGISTRY_IMAGE:$CI_REGISTRY_TAG
command: ["python", "-m", "feide_docs_rag.jobs.update_docs"]
volumeMounts:
- name: shared-docs
mountPath: /app/feide_docs_rag/local_docs
- name: shared-vectorstore
mountPath: /app/vectorstore-data
env:
- name: RUNNING_ENVIRONMENT
value: $RUNNING_ENVIRONMENT
- name: ACCESS_TOKEN_DOCS_GITLAB
valueFrom:
secretKeyRef:
name: $CI_ENVIRONMENT_SLUG-app-secrets
key: ACCESS_TOKEN_DOCS_GITLAB
- name: AZURE_OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: $CI_ENVIRONMENT_SLUG-app-secrets
key: AZURE_OPENAI_API_KEY
- name: AZURE_OPENAI_ENDPOINT
valueFrom:
secretKeyRef:
name: $CI_ENVIRONMENT_SLUG-app-secrets
key: AZURE_OPENAI_ENDPOINT
- name: AZURE_OPENAI_API_VERSION
valueFrom:
secretKeyRef:
name: $CI_ENVIRONMENT_SLUG-app-secrets
key: AZURE_OPENAI_API_VERSION
- name: AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME
valueFrom:
secretKeyRef:
name: $CI_ENVIRONMENT_SLUG-app-secrets
key: AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME
- name: DB_HOST
valueFrom:
secretKeyRef:
name: $CI_ENVIRONMENT_SLUG-app-secrets
key: DB_HOST
- name: DB_NAME
valueFrom:
secretKeyRef:
name: $CI_ENVIRONMENT_SLUG-app-secrets
key: DB_NAME
- name: DB_USER
valueFrom:
secretKeyRef:
name: $CI_ENVIRONMENT_SLUG-app-secrets
key: DB_USER
- name: DB_PASS
valueFrom:
secretKeyRef:
name: $CI_ENVIRONMENT_SLUG-app-secrets
key: DB_PASS
volumes:
- name: shared-docs
persistentVolumeClaim:
claimName: $CI_ENVIRONMENT_SLUG-docs-pvc
- name: shared-vectorstore
persistentVolumeClaim:
claimName: $CI_ENVIRONMENT_SLUG-vectorstore-pvc
restartPolicy: OnFailure
2. Create Persistent Volume Claims
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: $CI_ENVIRONMENT_SLUG-docs-pvc
namespace: $KUBE_NAMESPACE
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: $CI_ENVIRONMENT_SLUG-vectorstore-pvc
namespace: $KUBE_NAMESPACE
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 5Gi
3. Update Deployment to Use Shared Volumes
// ... existing code ...
spec:
containers:
- name: app
// ... existing code ...
volumeMounts:
- name: shared-docs
mountPath: /app/feide_docs_rag/local_docs
- name: shared-vectorstore
mountPath: /app/vectorstore-data
// ... existing code ...
volumes:
- name: shared-docs
persistentVolumeClaim:
claimName: $CI_ENVIRONMENT_SLUG-docs-pvc
- name: shared-vectorstore
persistentVolumeClaim:
claimName: $CI_ENVIRONMENT_SLUG-vectorstore-pvc
// ... existing code ...
4. Create a Standalone Update Script
#!/usr/bin/env python
"""
Standalone script to update documentation and vector store.
This is meant to be run as a cron job separate from the web server.
"""
import os
import sys
import logging
import asyncio
from datetime import datetime
import pytz
from sqlalchemy.exc import OperationalError
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
# Add project root to path if running as script
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), "../..")))
from feide_docs_rag.database import db_connection_string
from rag_utils import VectorStore
from langchain_openai import AzureOpenAIEmbeddings
from feide_docs_rag.env_config import collection_name
from feide_docs_rag.configuration import FeideConfiguration
from feide_docs_rag.data_loader.loader import load_docs_from_web
from feide_docs_rag.data_loader.docs_updater import DocsUpdater
# Configure logging
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
handlers=[logging.StreamHandler()]
)
logger = logging.getLogger(__name__)
if sys.platform == "win32":
asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
@retry(
retry=retry_if_exception_type(OperationalError),
stop=stop_after_attempt(5),
wait=wait_exponential(multiplier=1, min=2, max=20),
reraise=True,
)
async def update_vectorstore():
"""Update the vector store with fresh data from the web."""
try:
logger.info(f"Starting vector store update at {datetime.now(pytz.timezone('CET'))}")
# Initialize the vector store
embeddings = AzureOpenAIEmbeddings(
api_version=os.environ["AZURE_OPENAI_API_VERSION"],
azure_deployment=os.environ["AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME"],
)
vectorstore = VectorStore(
collection_name,
db_connection_string,
embeddings=embeddings,
retriever_search_type="similarity_score_threshold",
retriever_kwargs={"score_threshold": 0.4, "k": 8},
)
# Update documentation from GitLab
docs_updater = DocsUpdater(
gitlab_url="https://gitlab.sikt.no",
gitlab_token=os.getenv("ACCESS_TOKEN_DOCS_GITLAB"),
gitlab_project_id=1141,
document_path="feide_docs_rag/local_docs",
)
updated = docs_updater.update_docs()
if updated:
logger.info("Documentation updated from GitLab, updating vector store")
# Process documents and update vector store
docs = FeideConfiguration(
base_url="https://docs.feide.no",
document_path="feide_docs_rag/local_docs",
web_loader_function=load_docs_from_web,
random_state=42,
).get_docs()
await vectorstore.setup_vectorstore(docs)
logger.info(f"Vector store update completed at {datetime.now(pytz.timezone('CET'))}")
else:
logger.info("No documentation updates from GitLab, skipping vector store update")
except Exception as e:
logger.error(f"Error updating vector store: {str(e)}", exc_info=True)
raise
async def main():
"""Main entry point for the script."""
try:
await update_vectorstore()
except Exception as e:
logger.error(f"Update process failed: {str(e)}", exc_info=True)
sys.exit(1)
if __name__ == "__main__":
asyncio.run(main())
5. Update Your App Lifecycle to Remove Scheduled Updates
from contextlib import asynccontextmanager
from feide_docs_rag.feide_chain import chain_service, vectorstore
from rag_utils import DefaultRagApp # type: ignore
@asynccontextmanager
async def lifespan(app: DefaultRagApp):
# No scheduling needed here anymore
yield
# No cleanup needed
app = DefaultRagApp(chain_service, lifespan=lifespan)
6. Update Your CI/CD Pipeline
// ... existing code ...
production:
extends:
- .production
- .vault-secrets
stage: production
variables:
RUNNING_ENVIRONMENT: "production"
REPLICAS: "2"
script:
- deploy deployment.yaml
- deploy volumes.yaml
- deploy cronjob.yaml
// ... existing code for staging and review environments ...
Benefits of This Approach
-
Separation of Concerns: The web service focuses on serving requests, while the document update logic runs independently.
-
Reliability: No service disruptions or restarts when documentation is updated.
-
Scalability: Each component can be scaled independently.
-
Resource Efficiency: Document updates run on a schedule without consuming resources continuously.
-
Failure Isolation: If the document update process fails, it doesn't affect the web service.
-
Better Operability: Separate logs and monitoring for each concern make troubleshooting easier.
This solution leverages Kubernetes native capabilities to create a more reliable and maintainable system that follows best practices for production deployments.