Web-as-corpus tools in Java.
* Simple Crawler (and also integration with Nutch and Heritrix)
* HTML cleaner to remove boiler plate code
* Language recognition
* Corpus builder

Project Activity

See All Activity >

License

Apache License V2.0

Follow JavaWAC

JavaWAC Web Site

Other Useful Business Software
Manage your entire team in one app Icon
Manage your entire team in one app

With Connecteam you can manage every aspect of your business on the go, no workstation needed.

Connecteam is an award-winning all-in-one employee management solution for daily operations, communications, and human resource management.
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of JavaWAC!

Additional Project Details

Intended Audience

Science/Research

User Interface

Web-based, Non-interactive (Daemon)

Programming Language

Java

Related Categories

Java Search Engines, Java Frameworks, Java Intelligent Agents, Java Information Analysis Software

Registered

2008-04-11