pdf becomes an open format
Monday January 29th 2007, 11:35 am
adobe announced its intent today to release the full pdf (portable document format) 1.7 specification to AIIM, the enterprise content management association. this is real good news in terms of open formats!
nutch bundle for eclipse
Monday January 15th 2007, 8:21 pm
a while ago, i created a howto wiki page to run nutch in eclipse. still, it seems like many people are having questions, so here’s a bundled version of nutch-0.8 that can be dropped into eclipse. you should be able to start crawling in 2 minutes
requirements
- eclipse 3.2
- java 1.4 or higher, tested with 1.5
- tested with ubuntu and win xp
import project into eclipse
- From the “File” menu select “Import…” and select “General”, “Existing Project into Workspace”, Click “Next >”
- Click “Browse…” next to “Select Root directory ” and navigate to the root of this document. Click “Open”
- Click “Finish” and the Package Explorer will show the project.
configure
- change the value CHANGE
- NUTCH WILL NOT RUN OTHERWISE
run it
- crawl: menu “Run”, “Run…” then double click on “Crawl” on the left list
- search: menu “Run”, “Run…” then double click on “SearchBean”
- by default, nutch is set up to crawl http://www.cnn.com and http://www.nytimes.com/
more infos
please don’t contact me directly for free support,
but use the mailing list