Machine Learning
This page contains some library code that can be used for machine learning tasks. The focus at this point is learning on text based input.
The following has been implemented:
- Distance metrics - Euclidean, Pearson co-relation coefficient
- Generic array and file operations useful in handling input data
- A library to prepare input text and convert it into a word matrix, on which different algorithms can be applied
- An agglomerative hierarchical clustering algorithm
Download or check out from:
http://208.78.102.143/svn/ml/trunk/
A clustering example from sample Wikipedia text has been implemented. The input data is a set of nine files each containing text about one of three selected topics (Biology, Operating Systems, Computer memory). The input files and corresponding clustered output is available here.
Planned
- Other distance metrics - Manhattan distance, Jaccard co-efficient
- Identifying similarity - Recommendations via collaborative filtering
- More unsupervised learning algorithms like k-means and column clustering
- Start on supervised learning algorithms
Bugs, suggestions or feedback? Please send it to thejo@kote.in
Last updated: August, 2008
Home