Personal tools
You are here: Home team-collaborations Sciencenet Search engine based on YACY p2p technology
Document Actions

Sciencenet Search engine based on YACY p2p technology

by Urban Liebel last modified 2008-01-26 22:10

We have just started a free distributed search engine for scientific knowledge based on YACY peer2peer technology (Michael Christen).

The KIT search engine initiative Sciencenet is a distributed search engine for scientific knowledge. We have started indexing the scientific web with standard desktop PCs at KIT.

  • You can test the current search results here sciencenet.fzk.de (alpha)
  • You can download the free YACY software (java) and contribute with your own search peer (see below).

Introduction:

Current search engines are based on popularity and/or sponsored links. Often the index is outdated. This makes it often difficult for scientists/students/teachers to find up-to-date scientific information. Many interesting scientific (lab-)websites are simply not popular enough for Google & co .

Problem:

Large scale search engines are "resource hungry". Thousands of computers are necessary to create a proper and fast index of global scale.

The idea:
The YACY peer to peer (p2p) technology comes here very handy. yacy-search.jpg

Many standard PCs distributed across the globe share the index of a large search engine.
Every PC keeps a fraction of the search engine index. The more PCs connected, the more pages can be indexed, stored and retrieved.
Currently a single YACY installation (on a standard PC) is good enough for 10 Mio web pages. The faster the PC (and the discs) the more pages can be searched in an reasonable amount of time.

Ideally every research institute runs it´s own search peer (or several) keeping the local part of the index up to date. If Google or any other search engine would crawl the entire internet on a daily basis, no bandwith would be left for anything.

Your contribution:

If you want to help indexing universities, research facilities or other scientific relevant sites, please feel free and download your own free Sciencenet YACY software here.

Status of the network:

* Sciencenet-Network overview

YaCy-network* Search engine interface: http://sciencenet.fzk.de.

If you don´t find your institute or university or your favourite scientific website in the index it is about time to download the client and support the network.

1) if not done already (download java (http://www.java.com)
2) download YACY Sciencenet http://harvester.fzk.de/yacy.zip

3) give your search peer a name
4) optional index you favourite scientific website via the "Crawl start" menu (e.g. www.myuniversity.org)


Requirements:

* Dedicated PC with ANY OS (YACY is java software, therefore it works on all OS)
* You need to install java from (http://www.java.com)

* At least 1,5 GByte RAM
* Ideally your peer runs on port 8080 ("visible from outside").

YACY has been tested to run with thousands of PCs in a network. If you want to contribute more than one peer, just go ahead.

Although setup and running the sciencenet YACY software is very simple (java , runs on all OSs), it allows configuring pretty much everything. You can even start you own crawl session by pointing to your favourite scientific website.

Have fun...


  • On the right you see some ultra cheap peers for the Sciencenet. YaCy-Cluster-Center

16 Standard PCs  are  already good enough for 100 Mio webpages.

  • If you are interested in contributing to the GLOBAL "all internet" search engine initiative using YACY  see the Yacy project page http://www.yacy.net and download the "YACY Freenet" client. Both software versions (YACY-sciencenet and YACY-freenet are identical) except the "yacy.init" file, which defines the peers network.
















Powered by Plone CMS, the Open Source Content Management System

This site conforms to the following standards: