Recently Orphaned Newspapers: From Archived Webpages to Reusable Datasets and Research Outlooks

pid.depositar.io/ark:37281/k5p

We convert the web archives of a recently orphaned newspaper into accessible article collections in IPTC standard format for news representation. We focus on Taiwan's Apple Daily and work on the WARC files built by the Archive Team and convert them into de-duplicated collections of pure text in ninjs (News in JSON) format...

-

Thanks team! Please follow us at @depositardepositar 研究資料寄存所 ❤️

0

If you have a fediverse account, you can quote this note from your own instance. Search https://social.coop/users/trc/statuses/114356857087974246 on your instance and quote it. (Note that quoting is not supported in Mastodon.)