To build this, we’ll use the newsapi-python library for fetching data and transformers for Zero-Shot Classification. This allows us to categorize text into your specific labels without needing a pre-trained model for each category. Prerequisites You will need an API key from NewsAPI.org. Install the necessary libraries via terminal: Bash pip install newsapi-python transformers torch keybert The Python Script This script fetches the top 25 headlines, uses a Zero-Shot Classifier to map them to your categories, and uses KeyBERT to extract the most relevant keywords. Python import torch from newsapi import NewsApiClient from transformers import pipeline from keybert import KeyBERT # 1. Setup API and Models NEWS_API_KEY = 'YOUR_NEWS_API_KEY_HERE' newsapi = NewsApiClient(api_key=NEWS_API_KEY) # Load Zero-Shot Classification pipeline (uses Bart-large-mnli) classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli") # Load Keyword Extraction model kw_model = KeyBERT() # Define your specific categories CATEGORIES = [ "Politics", "Business", "Education", "Technology/Science", "Government", "Health", "Entertainment", "Legal", "Community-Oriented", "Real Estate", "Military/Warfare" ] def get_news_and_process(): print("--- Fetching news articles ---") # Fetch 25 articles (default top headlines) top_headlines = newsapi.get_top_headlines(language='en', page_size=25) articles = top_headlines.get('articles', []) results = [] for i, art in enumerate(articles): title = art.get('title', '') description = art.get('description', '') # Combine title and description for better context text_to_analyze = f"{title}. {description}" if description else title if not text_to_analyze: continue print(f"Processing Article {i+1}: {title[:50]}...") # 2. Categorization (Zero-Shot) category_result = classifier(text_to_analyze, candidate_labels=CATEGORIES) top_category = category_result['labels'][0] # 3. Keyword Extraction (15 keywords) # top_n=15 to get exactly 15 keywords keywords_list = kw_model.extract_keywords(text_to_analyze, keyphrase_ngram_range=(1, 1), stop_words='english', top_n=15) keywords = [kw[0] for kw in keywords_list] results.append({ "title": title, "category": top_category, "keywords": keywords, "url": art.get('url') }) return results # Execute and Print if __name__ == "__main__": news_data = get_news_and_process() print("\n" + "="*50) print("PROCESSED ARTICLES REPORT") print("="*50 + "\n") for item in news_data: print(f"TITLE: {item['title']}") print(f"CATEGORY: [{item['category']}]") print(f"KEYWORDS: {', '.join(item['keywords'])}") print("-" * 30) Key Technical Details Zero-Shot Classification: Unlike standard classifiers that only know "Positive/Negative," the facebook/bart-large-mnli model understands the semantics of your labels. It can decide an article is about "Military/Warfare" even if that exact word isn't in the text. KeyBERT: This uses BERT embeddings to find phrases that are most similar to the document itself. It is much more effective than simple frequency-based methods (like Rake or YAKE) for short snippets like news descriptions. NewsAPI Limitation: The free version of NewsAPI primarily provides snippets (titles and descriptions). If you need the full body text, you would typically need to use a library like newspaper3k to scrape the url provided in the API response. Troubleshooting Tip If the script runs slowly, it's because the transformers model is heavy. If you have a dedicated GPU, ensure torch is using CUDA by adding device=0 inside the pipeline() function.