Now redundant weka mirror. Visit https://github.com/Waikato/weka-trunk for the real deal

cluster

import weka.clusterers.ClusterEvaluation;
import weka.clusterers.HierarchicalClusterer;
import weka.clusterers.EM;
import weka.core.converters.CSVLoader;
import weka.core.converters.ConverterUtils.DataSource;
import weka.core.neighboursearch.PerformanceStats;

import java.io.File;
import java.io.IOException;
import java.text.ParseException;
import java.util.ArrayList;
import java.util.Enumeration;

import weka.core.*;

public class WEKASample1 {

public static void main(String[] args) {

    Instances data = null;
    CSVLoader csvLoader = new CSVLoader();
    try {
        csvLoader.setSource(new File("D:\\WEKA\\numbers.csv"));

        data = csvLoader.getDataSet();
                HierarchicalClusterer h = new HierarchicalClusterer();

            DistanceFunction d = new DistanceFunction() {

        @Override
        public void setOptions(String[] arg0) throws Exception {

        }

        @Override
        public Enumeration listOptions() {
            return null;
        }

        @Override
        public String[] getOptions() {
            return null;
        }

        @Override
        public void update(Instance arg0) {

        }

        @Override
        public void setInvertSelection(boolean arg0) {

        }

        @Override
        public void setInstances(Instances arg0) {

        }

        @Override
        public void setAttributeIndices(String arg0) {

        }

        @Override
        public void postProcessDistances(double[] arg0) {

        }

        @Override
        public boolean getInvertSelection() {
            return false;
        }

        @Override
        public Instances getInstances() {
            return null;
        }

        @Override
        public String getAttributeIndices() {
            return null;
        }

        @Override
        public double distance(Instance arg0, Instance arg1, double arg2,
                PerformanceStats arg3) {
            return 0;
        }

        @Override
        public double distance(Instance arg0, Instance arg1, double arg2) {
            return 0;
        }

        @Override
        public double distance(Instance arg0, Instance arg1, PerformanceStats arg2)
                throws Exception {
            return 0;
        }

        @Override
        public double distance(Instance arg0, Instance arg1) {

            double s1 = arg0.value(0);
            double s2 = arg1.value(0);

            return Double.POSITIVE_INFINITY;
        }
    };

    h.setDistanceFunction(d);
    SelectedTag s = new SelectedTag(1, HierarchicalClusterer.TAGS_LINK_TYPE);
    h.setLinkType(s);

    h.buildClusterer(data);

//      double[] arr;
//      for(int i=0; i<data.size(); i++)/          arr = h.distributionForInstance(data.get(i));
//          for(int j=0; j< arr.length; j++)
//              System.out.print(arr[j]+",");
//          System.out.println();
//          
//      }

        System.out.println(h.numberOfClusters());
    } catch (Exception e) {
        e.printStackTrace();Now, the output for the number of clusters generated is always 2 even if I modify the distancefucntion method also. How do I know which instance if of which cluster? When I uncomment the code above that is written to get the distribution for the instances, I get an ArrayOutOfBound exception.

But in general, can anyone explain how is the clustering done hierarchically by WEKA here?

Now redundant weka mirror. Visit https://github.com/Waikato/weka-trunk for the real deal

Related tags

Overview

weka (mirror)

(Official README) WEKA (developer version)

Source code

Contributions/Bug fixes

Links

A few notes from the unofficial mirror

Comments

Owner

Benjamin Petersen

Java time series machine learning tools in a Weka compatible toolkit

Contrubute Now to help make aide a best platform for developing

Mirror of Apache Mahout

Mirror of Apache SystemML

Mirror of Apache SystemML

Mirror of Apache Qpid

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Please visit https://github.com/h2oai/h2o-3 for latest H2O

Please visit https://github.com/h2oai/h2o-3 for latest H2O

*old repository* --> this is now integrated in https://github.com/javaparser/javaparser

Mystral (pronounced "Mistral") is an efficient library to deal with relational databases quickly.

Winfoom is an HTTP(s) proxy server facade that allows applications to authenticate through the proxy without having to deal with the actual handshake.

Real-time Query for Hadoop; mirror of Apache Impala

This code base is retained for historical interest only, please visit Apache Incubator Repo for latest one

Java time series machine learning tools in a Weka compatible toolkit

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.

DEPRECATED: use https://github.com/jhipster/jhipster-bom instead

A simple expressive web framework for java. Spark has a kotlin DSL https://github.com/perwendel/spark-kotlin

old repository --> this is now integrated in https://github.com/javaparser/javaparser