Build and deploy a Streamlit application that performs semantic search using Weaviate
By the end of this workshop, students will be able to:
Add the github repo or the streamlit app URL in the google spreadsheet in the "Weaviate Search Engine" column
The challenge of this workshop is to have an app ready in 3 hours!
You can only achieve that velicity if you work with a decent AI coding platform. I suggest windsurf which is much more complete than vscode with copilot. Cursor, Claude code (by far the best at the moment) or Manus are also great.
If you have never worked with these platforms now is the time to make the jump!
After the 1st project, you should have a dataset available in MongoDB, either local or on Atlas.
If that's the case, you import it from MongoDB into Weaviate.
Else:
In all cases, you must (re)create the embeddings.
You should use windsurf or claude code. game changer. Copilot is autocomplete on steroids whereas Windsurf is an experienced code engineer with acccess to the whole codebase.
Other AI coding platforms such as cursor, or claude code are fantastic.
Streamlit is a simplified framework to publish data oriented websites with a few lines of python. Fast learning curve.
When you commit your code to a public github repo, streamlit can host it for free!
If you don't know streamlit yet, check out the playground
The simple solution is to download your dataset from MongoDB as json and then to upload / import it to weaviate.
But there's also several ways to stream data directly from MongoDB Atlas to Weaviate without intermediate files:
Ask your AI to write a Python script that connects to both databases simultaneously and streams data.
You can import data into Weaviate without creating a schema. Weaviate will use all default settings, and infer what data type you use.
Or you can define a schema which is advised to optimize Weaviate’s performance.
The simplest and costs free way to create embeddings is to use SentenceTransformer from Huggingface
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
Another cost free way is to use Ollama on local with the nomic-embed-text model. Download Ollama. That would be my favorite solution. However, when you publish the app, you need a way to get embeddings for the query. Setting up Ollama to work remotely with an API endpoint is too complex for this workshop. So we'll stick with SentenceTransformer.
Then
In a left sidebar, the user can also select:
You are absolutely free to add the features you like.
the GitHub repository should contain
app.py (main Streamlit application)requirements.txt (Python dependencies)README.md (project documentation)config.py (configuration management).streamlit/secrets.toml (for sensitive data)Create a requirementts.txt file with:
pip freeze > requirements.txt
To deploy on streamlit cloud:
CCreate a streamlit account on https://streamlit.io/cloud
Make sure your app runs locally: Before deploying, test your app one last time locally to catch any last-minute errors.
Commit and Push: Commit all your changes (including app.py, requirements.txt, and any other necessary files) to your GitHub repository.
Log In: Go to https://streamlit.io/cloud and log in using your GitHub account.
Click "New App": On your Streamlit Cloud dashboard, look for a button or link labeled "New App" or something similar. Click it.
Repository: A dropdown menu will appear, allowing you to select the GitHub repository containing your Streamlit app. Find your repository in the list and select it. You may need to grant Streamlit Cloud permission to access your repositories if this is your first time.
Branch: Select the branch from which you want to deploy.
Main File Path: Specify the path to your main Streamlit application file (usually app.py). If your app.py file is in the root directory of your repository, you can simply enter app.py. If it's in a subdirectory, specify the full path (e.g., src/app.py).
Click "Deploy!": Once you've configured everything, click the "Deploy!" button.
Reranking seeks to improve search relevance by reordering the result set returned by a search with a different model.
see https://weaviate.io/developers/weaviate/concepts/reranking
Note: the Hugging face reranker module requires a hugging face API key. Not sure if they offer a free tier.
Use models like cross-encoder/ms-marco-MiniLM-L-6-v2
Test your application with:
Once you have deployed your app, share the link in the discord and in the spreadshhet.
Don't forget to add your name in the app so I know who did it
First one to finish gets a high five!