Recently Orphaned Newspapers: From Archived Webpages to Reusable Datasets and Research Outlooks
https://pid.depositar.io/ark:37281/k5p3h9k37
We convert the web archives of a recently orphaned newspaper into accessible article collections in IPTC standard format for news representation. We focus on Taiwan's Apple Daily and work on the WARC files built by the Archive Team and convert them into de-duplicated collections of pure text in ninjs (News in JSON) format...
-
Thanks team! Please follow us at
@depositardepositar 研究資料寄存所 ❤️



