-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
I have extracted some meta tags, you can try to identify title, text, description and date by replacing provided tags in :
meta[property='{}']
meta[name='{}']
meta[itemprop='{}']
Meta tags for publication and modification date:
published_date
published_time
cXenseParse:publishtime
pubdate
publish_date
PublishDate
dcterms.created
rnews:datePublished
article:published_time
prism.publicationDate
displaydate
OriginalPublicationDate
og:published_time
datePublished
article_date_original
article.published
published_time_telegram
sailthru.date
datePublished
date
Date
original-publish-date
DC.date.issued
dc.date
DC.Date
parsely-pub-date
publishtime
publication_date
uploadDate
coverageEndTime
publishdate
publish-date
publishedAtDate
dcterms.date
publishedDate
creationDateTime
pub_date
updated_time
og:updated_time
datemodified
last-modified
Last-Modified
DC.date.modified
article:modified_time
modified_time
modifiedDateTime
dc.dcterms.modified
lastmod
Meta tags for title:
dc.title
og:title
headline
articletitle
article-title
parsely-title
title
Meta tags for description:
description
og:description
Meta tags for body:
articleBody
articleText
FYI
It would be good if you can fix/improve/adapt the code so that it can extract full information from these websites since these websites are the most popular websites in the world.
By "full information" i mean title, publication date and article body
CNN - https://edition.cnn.com/
BBC News - https://www.bbc.com/news
Reuters - https://www.reuters.com/
The New York Times - https://www.nytimes.com/
The Guardian - https://www.theguardian.com/international
Al Jazeera - https://www.aljazeera.com/
Associated Press (AP) News - https://apnews.com/
NBC News - https://www.nbcnews.com/
Fox News - https://www.foxnews.com/
USA Today - https://www.usatoday.com/
ABC News - https://abcnews.go.com/
CBS News - https://www.cbsnews.com/
The Washington Post - https://www.washingtonpost.com/
Time - https://time.com/
Forbes - https://www.forbes.com/
Bloomberg - https://www.bloomberg.com/
The Wall Street Journal - https://www.wsj.com/
The Huffington Post - https://www.huffpost.com/
The Independent - https://www.independent.co.uk/
The Sydney Morning Herald - https://www.smh.com.au/
The Economist - https://www.economist.com/
The Times of India - https://timesofindia.indiatimes.com/
The Daily Mail - https://www.dailymail.co.uk/home/index.html
The Telegraph - https://www.telegraph.co.uk/
The Sun - https://www.thesun.co.uk/
The Mirror - https://www.mirror.co.uk/
The Daily Beast - https://www.thedailybeast.com/
The Atlantic - https://www.theatlantic.com/
National Geographic - https://www.nationalgeographic.com/
Science Daily - https://www.sciencedaily.com/
The Verge - https://www.theverge.com/
Wired - https://www.wired.com/
TechCrunch - https://techcrunch.com/
Engadget - https://www.engadget.com/
Mashable - https://mashable.com/
Forbes India - https://www.forbesindia.com/
Hindustan Times - https://www.hindustantimes.com/
CNN Business - https://www.cnn.com/business
Financial Times - https://www.ft.com/
CNBC - https://www.cnbc.com/
Business Insider - https://www.businessinsider.com/
Politico - https://www.politico.eu/
The Hill - https://thehill.com/
The Washington Times - https://www.washingtontimes.com/
The Boston Globe - https://www.bostonglobe.com/
The LA Times - https://www.latimes.com/
The Chicago Tribune - https://www.chicagotribune.com/
The Sydney Morning Herald - https://www.smh.com.au/
The Globe and Mail - https://www.theglobeandmail.com/
The Toronto Star - https://www.thestar.com/