Wednesday, August 08, 2007

busy yet ..
installing the fresh dump of wikipedia for the college LAN. After a few failures by using xml2sql,etc, Now am using mwdumper and directly putting the data into mysql. Here is one of the best pages that I found for installing wikipedia on a local computer. PS: Am working on Fedora 7. Windows user may find this useful. Other than this, there is also webaroo for windows. I have even tested this with windows Vista. The webaroo uses about 5.5 GB of disk space. Whereas the actual wikipedia dump for august is about 12 GB in xml format and 2.6 GB approx in bz2 format. The latest dumps can be downloaded from here. As to what you should download from there, read this.



Another important thing to be installed for wikipedia offline usage is mediawiki.


Here is an easy way of installing wikipedia for offline usage on a fedora based system..


Steps to get mwdumper to work on Redhat and Fedora Core Linux Distros with mysql 5.x

* 1. Destroy any existing database: i.e. mysqladmin drop -p (enter root db password)
* 2. Recreate the database: i.e. mysqladmin create -p (enter root db password)
* 3. DO NOT install MediaWiki until the database in uploaded. If MediaWiki is installed, mwdumper may not work.
* 3. Run the tables.sql function supplied with MediaWiki from the MediaWiki root directory: i.e. mysql -u root -p < maintenance/tables.sql
* 4. Start mwdumper: i.e. java -jar mwdumper.jar --format=sql:1.5 | mysql -u root -p
* 5. You will have to manually enter admin status for WikiSysop accounts and run the upgrade.php script on the MediaWiki database in order to obtain WikiSysop access or install over the previous MediaWiki installation and import the databases in order to activiate the WikiSysop account. Add the following MediaWiki PHP file to your /maintenance directory by cutting and pasting the text, and name the file createBcrat.php, then recreate the WikiSysop account by executing the example scripts provided. The attached PHP file is for the MediaWiki 1.9.3 release.

file createBcrat.php


/**
* Maintenance script to create an account and grant it administrator and bureaucrat group membership
*
* @package MediaWiki
* @subpackage Maintenance
* @author Rob Church
* @author Jeff Merkey
*/

require_once( 'commandLine.inc' );

if( !count( $args ) == 2 ) {
echo( "Please provide a username and password for the new account.\n" );
die( 1 );
}

$username = $args[0];
$password = $args[1];

echo( wfWikiID() . ": Creating and promoting User:{$username}..." );

# Validate username and check it doesn't exist
$user = User::newFromName( $username );
if( !is_object( $user ) ) {
echo( "invalid username.\n" );
die( 1 );
} elseif( 0 != $user->idForName() ) {
echo( "account exists.\n" );
$user->addGroup( 'sysop' );
$user->addGroup( 'bureaucrat' );
$ssu->doUpdate();
echo( "done.\n" );
die( 1 );
}

# Insert the account into the database
$user->addToDatabase();
$user->setPassword( $password );
$user->setToken();

# Promote user
$user->addGroup( 'sysop' );
$user->addGroup( 'bureaucrat' );

# Increment site_stats.ss_users
$ssu = new SiteStatsUpdate( 0, 0, 0, 0, 1 );
$ssu->doUpdate();

echo( "done.\n" );

?>

Steps to recreate WikiSysop and add the account to groups "sysop" and "bureaucrat"

from your MediaWiki root directory, enter the following commands:

php maintenance/createBcrat WikiSysop
php maintenance/changePassword --user=WikiSysop --password=

You may have to enter "password" twice in order for the account to work properly, which is why there is a call to "changePassword" after the account has been recreated and assigned sysop and bureacrat status.


mwdumper is not the correct tool if you want to maintain an existing wiki as it may not always work correctly if the MediaWiki databases have already been installed on the Fedora Core releases and may not provide useful output as to any errors occurring. Most of these problems are related to record and insertion rejection of SQL requests by the underlying MySQL database version you may be running. You may wish to test mwdumper with your particular OS distribution with a trial run to see if you encounter any of these problems. There are several fixes for some of these issues.

Known Problems

* mwdumper will fail with: ERROR 1153 (08S01) at line 2187: Got a packet bigger than 'max_allowed_packet' bytes if your dumps contain large sections of unicode characters with Cherokee Unicode and other unicode texts and in most cases does not work at all with these dumps, even with the defaults in mysql set to utf8. One solution to this problem is to increase the max packet (request) size which can be input into mysql via INSERT commands. Try changing set-variable = max_allowed_packet=20M in your /etc/my.conf file and restart the mysqld program.
* mwdumper does not report errors when uploading to a system with a database that is not freshly created.
* mwdumper may not always complete the dump, even though it is reporting that it is and even if you have followed all the procedures listed here. Due to the lack of proper error handling in the program, it may be better to just run importDump.php if you encounter problems with this tool.
* if you run into problems using mwdumper to input directly into mysql on a particular Linux Distribution or version of the mysql database, consider setting up mwdumper to convert the XML dumps into an intermediate .sql file then import the output file directly into mysql rather then allowing mwdumper to do so.
* try passing the '-f' switch (force) with mysql to force record insertions into your MySQL database if mysql starts rejecting updates from the mwdumper program or reports duplicate key errors.

No comments: