5 Repositories
Java crawler Libraries
Apache Nutch is an extensible and scalable web crawler
Apache Nutch README For the latest information about Nutch, please visit our website at: https://nutch.apache.org/ and our wiki, at: https://cwiki.apa
A scalable, mature and versatile web crawler based on Apache Storm
StormCrawler is an open source collection of resources for building low-latency, scalable web crawlers on Apache Storm. It is provided under Apache Li
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Sparkler A web crawler is a bot program that fetches resources from the web for the sake of building applications like search engines, knowledge bases
A scalable web crawler framework for Java.
Readme in Chinese A scalable crawler framework. It covers the whole lifecycle of crawler: downloading, url management, content extraction and persiste
Open Source Web Crawler for Java
crawler4j crawler4j is an open source web crawler for Java which provides a simple interface for crawling the Web. Using it, you can setup a multi-thr