Add Squidwarc to the list (#29)

2017-07-24 22:24:44 -04:00 · 2017-07-24 22:24:44 -04:00 · 3505b572dd
parent d7fd3167a2
commit 3505b572dd
1 changed files with 4 additions and 2 deletions
--- a/README.md
+++ b/README.md
@ -15,7 +15,7 @@ An [Awesome List](https://github.com/sindresorhus/awesome/blob/master/awesome.md

 Please ensure your pull request adheres to the following guidelines:

-* Use the following format: 
+* Use the following format:
   * `[Name](link)` (Status: **Stable** or **In Development**) - Brief Description of what the module does
 * Make an individual pull request for each new item.
 * Link additions should be inserted alphabetically to the relevant category.
@ -64,7 +64,7 @@ This list of tools and software is intended to briefly describe some of the most
 * [Brozzler](https://github.com/internetarchive/brozzler) (Stable) - A distributed web crawler (爬虫) that uses a real browser (chrome or chromium) to fetch pages and embedded urls and to extract links.

 * [F(b)arc](https://github.com/justinlittman/fbarc) (Stable) - A commandline tool and Python library for archiving data from [Facebook](https://www.facebook.com/) using the [Graph API](https://developers.facebook.com/docs/graph-api).
- 
+
 * [grab-site](https://github.com/ludios/grab-site) (Stable) - The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns.

 * [Heritrix](https://webarchive.jira.com/wiki/display/Heritrix/Heritrix) (Stable) - An open source, extensible, web-scale, archival quality web crawler.
@ -77,6 +77,8 @@ This list of tools and software is intended to briefly describe some of the most

 * [SiteStory](http://mementoweb.github.com/SiteStory/) (Stable) - A transactional archive that selectively captures and stores transactions that take place between a web client (browser) and a web server.

+* [Squidwarc](https://github.com/N0taN3rd/Squidwarc) (In Development) - An [open source, high-fidelity, page interacting](http://ws-dl.blogspot.com/2017/07/2017-07-24-replacing-heritrix-with.html) archival crawler that uses Chrome or Chrome Headless directly.
+
 * [twarc](https://github.com/docnow/twarc) (Stable) - A command line tool and Python library for archiving Twitter JSON data.

 * [WARCreate](http://matkelly.com/warcreate/) (Stable) - A [Google Chrome](https://www.google.com/intl/en/chrome/browser/) extension for archiving an individual webpage or website to a WARC file.