Add to MyYahoo!
Subscribe in NewsGator Online
Add to Newsburst
Add to Google
Add to My AOL
Add to Pluck
Subscribe in FeedLounge
Add to Windows Live
Add to NetVibes
Subscribe in Rojo
Subscribe in Bloglines
Add to MyMSN
Add to Plusmo for your cellphone
Add to PageFlakes
Add to Technorati
Add to BlinkBits
changing architecture of component Print E-mail
User Rating: / 1
PoorBest 
Tuesday, 27 September 2005 21:11

As I am in the middle of the development (a little bit more than 60% done) of my PHP Bayesian Naive Filter (a learning filter against spams comment, guestbook, and posting in general) for Joomla/Mambo and after reading some paper found on internet:

  • On Attacking Statistical Spam Filters, Gregory L. Wittel and S. Felix Wu - Department of Computer Science - University of California, Davis One Shields Avenue, Davis, CA 95616 USA
  • A Naive Bayes Spam Filter, Kai Wei This e-mail address is being protected from spambots. You need JavaScript enabled to view it CS281A Project, Fall 2003
  • But there is more...

I decide that my project will be certainly a failure if I rewrite or reuse a Bayesian filter engine which is not accurate or using the latest countermeasures. Since I do not want to develop during 3 years an effective filter (will I ever be able to do it???), I came across the idea of implementing the component com_bayesiannaivefilter in such a way that I can abstract the core engine and use the work done by the best open source project.

It is also clear for me since the beginning that a spam filter must be trained on a very large data volume (more than 1000 messages, the more the better) in order to categorize the message with accuracy. Webservices will have my preference as an internet entities with the require cpu horsepower and data store should be able to offer the best categorizing messages efficiency....

My component will be able to use following Bayesian Naive Filter core: (planned but not done, I it is technically possible I will do it)

Plugins Remarqs Possible open source project
JAVA I am a J2EE developer, Back to the roots :-) Som. to propose? contact me!
PHP Core done, but very simple tokenization and hashing of message
Volume of data small
Som. to propose? contact me!
PERL Can PHP call perl code? Som. to propose? contact me!
CGI-BIN Should be easy to do Som. to propose? contact me!
WEBSERVICES Should be easy as soon as we found a WS provider
Data volume?
Som. to propose? contact me!

Each technology may contains many core engine, or different versions. I will fill this table with possible candidate (You can heelp me by suggesting or speeding development).

Core requirement:

  • Use mySQL,
  • Most of internet provider allow the use of CGI-BIN, PERL, JAVA

This project will be soon committed to Joomla forge!

Tags See All Tags Add New Tag...

Please Enter New Tags Separated By Comma's
  Or Close


Powered By Joomla Tags

Comments
Add New Search RSS
Write comment
Name:
Email:
 
Title:
UBBCode:
[b] [i] [u] [url] [quote] [code] [img] 
 
:):grin;)8):p:roll:eek:upset:zzz:sigh:?:cry
:(:x
Please input the anti-spam code that you can read in the image.

3.20 Copyright (C) 2007 Alain Georgette / Copyright (C) 2006 Frantisek Hliva. All rights reserved."

 


Another articles:


Content View Hits : 2926658

Enter Amount: