Building Smarter Web Search Workflows
Learn how to integrate programmable web search into your AI workflows using tools like SerpAPI, Tavily, SearxNG, Exa, and Google CSE. This guide covers geo-targeted queries, proxy strategies, and building a live, LangChain-powered search-to-summarization pipeline with Streamlit and OpenAI.
Search is central to intelligent applications, whether it's for real-time data retrieval, summarization, research, or chatbot interactions.
In this post, we'll explore a variety of tools and APIs that enable programmable web search, show how to incorporate them into Python workflows, discuss proxying strategies for geographic targeting, and build a simple LangChain-powered pipeline for an end-to-end application.
Overview of Web Search Tools
Here's a breakdown of the most useful web search APIs and meta search engines:
Tool | Description | Python Support | Geo-targeting Support | Free Tier Availability |
---|---|---|---|---|
SerpApi | Google Search API wrapper | Yes | Yes (location param) | Yes (limited) |
SearxNG | Open source meta search engine | Yes | Yes (via locale) | Yes |
Tavily | Fast, hosted web search API | Yes | Yes (location param) | Yes |
Exa AI | AI-powered semantic search | Yes | Yes (location param) | Yes |
Google Programmable Search | Custom Google-powered search | Yes | Yes (gl , hl ) |
Yes |
SerpAPI
SerpAPI wraps around Google search and supports advanced options like rich snippets, shopping, and maps.
from serpapi import GoogleSearch
params = {
"q": "latest AI tools",
"location": "London, UK",
"api_key": "YOUR_API_KEY"
}
search = GoogleSearch(params)
results = search.get_dict()
SearxNG (Self-hosted Meta Search)
SearxNG aggregates results from multiple engines and is privacy-focused. It can be used via direct HTTP requests or through Python.
import requests
params = {
'q': 'best pizza in NYC',
'format': 'json',
'locale': 'en-US'
}
response = requests.get("http://localhost:8888/search", params=params)
print(response.json())
To use SearxNG, you can either connect to public instances or deploy your own with Docker.
Tavily
Tavily offers fast web search with simple API access.
from tavily import TavilyClient
client = TavilyClient("tvly-YOUR_API_KEY")
results = client.search("tech news", location="Berlin, Germany")
Exa AI
Exa allows semantic and keyword-based web search with fast indexing and relevance filtering.
import requests
headers = {'Authorization': 'Bearer YOUR_API_KEY'}
data = {'query': 'AI research papers', 'location': 'San Francisco, CA'}
response = requests.post('https://api.exa.ai/v1/search', headers=headers, json=data)
Google Programmable Search
Google Custom Search (CSE) can be used for limited programmatic queries.
import requests
params = {
'q': 'machine learning tutorials',
'cx': 'YOUR_CX_ID',
'key': 'YOUR_API_KEY',
'gl': 'US',
'hl': 'en'
}
response = requests.get('https://www.googleapis.com/customsearch/v1', params=params)
Proxies and Geographic Relevance
For applications that need results from specific regions or to bypass geofencing, proxies are essential.
Using SOCKS5 Proxies in Python
SOCKS5 proxies route all traffic through a specified server, masking origin IP and location, handy especially when working on cloud servers because their IP ranges are a little too well known to be servers and not genuine human traffic.
Install PySocks:
pip install pysocks
import requests
proxies = {
'http': 'socks5h://user:pass@proxy.example.com:1080',
'https': 'socks5h://user:pass@proxy.example.com:1080'
}
response = requests.get("http://example.com", proxies=proxies)
Webshare.io
Webshare provides a pool of proxies for various regions.
proxies = {
'http': 'http://username:password@proxy.webshare.io:80',
'https': 'http://username:password@proxy.webshare.io:80'
}
Cloudflare WARP
Cloudflare WARP encrypts traffic and can be configured to route through specific regions. Cloudflare WARP can be connected using:
warp-cli connect
Proxy Services
Service | Type | Authenticated | Location Control |
---|---|---|---|
Webshare | HTTP(S) | Yes | Yes |
Cloudflare WARP | VPN/CLI | No | Limited control |
Bright Data | Residential / Mobile | Yes | Yes |
Oxylabs | Datacenter / Rotating | Yes | Yes |
Building a LangChain Search Pipeline
Integrating search results into a conversational AI can enhance user interactions. We'll now build a simple LangChain-based application that searches the web, converts HTML to Markdown, and uses OpenAI to summarize the result.
- Search with SerpApi:
- Fetch the HTML content from search results
- Convert HTML to Markdown:
- Generate Chat Response with OpenAI:
Example Code (Streamlit UI)
# pip install streamlit langchain_community langchain_openai python-dotenv
# Requires the two environment variables: OPENAI_API_KEY & SERPAPI_API_KEY in a .env file
import streamlit as st
from langchain_community.utilities import SerpAPIWrapper
from langchain_community.document_loaders import AsyncHtmlLoader
from langchain_community.document_transformers import MarkdownifyTransformer
from langchain_openai import ChatOpenAI
from langchain.text_splitter import RecursiveCharacterTextSplitter
from dotenv import load_dotenv
load_dotenv()
NUM_WEB_RESULTS=3
CHUNKS_PER_DOCUMENT=2
def main():
st.title("AI-Powered Search Chatbot")
q=st.text_input("Enter your query:")
if q:
try:
sr = SerpAPIWrapper().results(q).get('organic_results',[])[:NUM_WEB_RESULTS]
urls = [r.get('link') for r in sr if r.get('link')];
if not urls: return
md = MarkdownifyTransformer().transform_documents(AsyncHtmlLoader(urls).load())
ts = RecursiveCharacterTextSplitter(chunk_size=2000,chunk_overlap=200,length_function=len)
cc = "\n\n---\n\n".join(chunk for d in md for chunk in ts.split_text(d.page_content)[:CHUNKS_PER_DOCUMENT])
st.write(
ChatOpenAI(model="gpt-3.5-turbo").invoke(
f"Based on the following web search results, provide a comprehensive answer to the question: '{q}'\n\nSearch Results:\n{cc}\n\nPlease provide a balanced, factual summary based on the information above."
).content
)
except Exception as e:
st.error(f"Error: {e}")
if __name__=="__main__":
main()
This minimal streamlit application performs live search, processes the content, and interacts conversationally, all with a few lines of code.
Sources
- SearxNG Documentation
- SerpApi Docs
- Tavily API Docs
- Exa AI Docs
- Google Programmable Search
- Webshare API
- LangChain Docs
This guide should help you build customized, geo-aware search pipelines for intelligent applications. Whether you're building a research assistant or a domain-specific chatbot, these tools provide the foundation for high-quality results and meaningful user interactions.