org.imirsel.m2k.util
Class CreateTestAndTrainSets

java.lang.Object
  extended byjava.util.Observable
      extended byncsa.d2k.core.modules.RootModule
          extended byncsa.d2k.core.modules.ConfigurableModule
              extended byncsa.d2k.core.modules.EmbeddedPipeModule
                  extended byncsa.d2k.core.modules.ExecModule
                      extended byncsa.d2k.core.modules.IOModule
                          extended byncsa.d2k.core.modules.DataPrepModule
                              extended byorg.imirsel.m2k.util.CreateTestAndTrainSets
All Implemented Interfaces:
java.lang.Cloneable, ncsa.d2k.core.modules.Module, java.io.Serializable, ncsa.d2k.core.modules.SystemModule

public class CreateTestAndTrainSets
extends ncsa.d2k.core.modules.DataPrepModule

A module that takes FileListWithClass objects, divides them into a test and training set and writes the filenames and class metadata out to the specified files. Optionally this module can replace the String class labels with integers.

Author:
kw
See Also:
Serialized Form

Field Summary
 
Fields inherited from class ncsa.d2k.core.modules.ConfigurableModule
addedInputInfo, addedInputNames, addedInputTypes, addedOutputInfo, addedOutputNames, addedOutputTypes, addPortListeners
 
Fields inherited from class ncsa.d2k.core.modules.RootModule
alias, children, DEBUG_LEVEL, EMPTY_INPUT, ERROR_LEVEL, executionManager, FATAL_LEVEL, HAVE_INPUT, INFO_LEVEL, iPipes, iPipesIndex, iTriggers, oPipes, oPipesIndex, oTriggers, parents, pipesFullManager, stats, triggerChildren, triggerParents, WARN_LEVEL
 
Fields inherited from interface ncsa.d2k.core.modules.SystemModule
BLOCKED_STATE, IDLE_STATE, MEDIUM, MEDIUM_RARE, MEDIUM_WELL, RARE, READY_STATE, WELL, WORKING_STATE
 
Constructor Summary
CreateTestAndTrainSets()
          Creates a new instance of WriteFileListsToTextFile
 
Method Summary
 void beginExecution()
          Clears variable before each execution of an itinery that contains this module
protected  void doit()
          Takes FileListWithClass objects divides them into a test and training set and writes the filenames and class metadata out to the specified file.
 java.lang.String getInputInfo(int param)
          Returns a text description for the indicated input
 java.lang.String getInputName(int i)
          Returns a text name for the given input
 java.lang.String[] getInputTypes()
          Returns an array of strings containing the Java data types of the input.
 java.lang.String getModuleInfo()
          Returns information about the module
 int getNumArraysExpected()
          Returns the number of arrays to receive before outputting
 java.lang.String getOutputInfo(int param)
          Returns a text name for the given output
 java.lang.String getOutputName(int i)
          Returns a text name for the given output
 java.lang.String[] getOutputTypes()
          Returns an array of strings containing the Java data types of the outputs.
 ncsa.d2k.core.modules.PropertyDescription[] getPropertiesDescriptions()
          Returns an array of description objects for each property of the Module.
 java.lang.String getTestFileName()
          Returns the filename that the testing FileListWithClass objects will be written to.
 java.lang.String getTrainFileName()
          Returns the filename that the training FileListWithClass objects will be written to.
 double getTrainingProportion()
          Returns the proportion of the data to be used for the training set
 boolean getUseIntegerLabels()
          Returns the value of the flag that controls whether the String labels will be replaced with integers.
 java.lang.String getWorkingDir()
          Returns the working directory that the FileListWithClass objects will be written to.
 boolean isReady()
          Controls whether the module is able to run based on the input flags and whether any class names have been exracted from FileListWithClass objects.
 void setNumArraysExpected(int val)
          Sets the number of arrays to receive before outputting
 void setTestFileName(java.lang.String file)
          Sets the filename that the testing FileListWithClass objects will be written to.
 void setTrainFileName(java.lang.String file)
          Sets the filename that the training FileListWithClass objects will be written to.
 void setTrainingProportion(double TrainingProportion_)
          Sets the proportion of the data to be used for the training set
 void setUseIntegerLabels(boolean useIntLabels)
          Sets the flag that controls whether the String labels will be replaced with integers.
 void setWorkingDir(java.lang.String path)
          Sets the working directory that the FileListWithClass objects will be written to.
 
Methods inherited from class ncsa.d2k.core.modules.ExecModule
execute, setExecutionManager
 
Methods inherited from class ncsa.d2k.core.modules.ConfigurableModule
addAddPortListener, addInput, addInputTrigger, addOutput, addOutputTrigger, getAddedInputTypes, getAddedOutputTypes, getAddPortListeners, insertInput, insertOutput, removeAddPortListener, removeInput, removeInputTrigger, removeOutput, removeOutputTrigger, setAddedInputTypes, setInputType, setOutputType
 
Methods inherited from class ncsa.d2k.core.modules.RootModule
activateTriggers, begin, canRun, clone, debug, debug, disconnectInputPipe, disconnectInputTriggers, disconnectOutputPipe, disconnectOutputTriggers, end, endExecution, error, error, fatal, fatal, fetchInputs, getAlias, getChildIndex, getChildMux, getChildren, getExecutionManager, getFile, getFlags, getFullPipeManager, getGuiComponent, getImage, getInputPipes, getInputPipeSize, getInputTriggers, getModuleName, getModuleStatistics, getNumInputs, getNumOutputs, getOutputCounts, getOutputPipes, getOutputPipeSize, getOutputTriggers, getParentIndex, getParentMux, getParents, getPipesFull, getPriority, getPropertyEditor, getResource, getRootName, getState, info, info, initModule, isAborting, isHead, isInputPipeConnected, isOutputPipeConnected, pullInput, pushOutput, resetInputs, setAlias, setBlocked, setD2KModulesLoggingLevel, setFlags, setFullPipeManager, setIdle, setInputPipe, setLogLevel, setModuleStatistics, setModuleStatisticsByMachine, setOutputCounts, setOutputPipe, setPipesFull, setReady, setResource, setState, setThePriority, setWorking, trigger, triggersActivated, warn, warn
 
Methods inherited from class java.util.Observable
addObserver, clearChanged, countObservers, deleteObserver, deleteObservers, hasChanged, notifyObservers, notifyObservers, setChanged
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CreateTestAndTrainSets

public CreateTestAndTrainSets()
Creates a new instance of WriteFileListsToTextFile

Method Detail

setNumArraysExpected

public void setNumArraysExpected(int val)
Sets the number of arrays to receive before outputting

Parameters:
val - number of arrays expected

getNumArraysExpected

public int getNumArraysExpected()
Returns the number of arrays to receive before outputting

Returns:
number of arrays expected

getPropertiesDescriptions

public ncsa.d2k.core.modules.PropertyDescription[] getPropertiesDescriptions()
Returns an array of description objects for each property of the Module.

Returns:
an array of description objects for each property of the Module.

getTestFileName

public java.lang.String getTestFileName()
Returns the filename that the testing FileListWithClass objects will be written to.

Returns:
the filename that the testing FileListWithClass objects will be written to.

setTestFileName

public void setTestFileName(java.lang.String file)
Sets the filename that the testing FileListWithClass objects will be written to.

Parameters:
file - the filename that the testing FileListWithClass objects will be written to.

getTrainFileName

public java.lang.String getTrainFileName()
Returns the filename that the training FileListWithClass objects will be written to.

Returns:
the filename that the training FileListWithClass objects will be written to.

setTrainFileName

public void setTrainFileName(java.lang.String file)
Sets the filename that the training FileListWithClass objects will be written to.

Parameters:
file - the filename that the training FileListWithClass objects will be written to.

getUseIntegerLabels

public boolean getUseIntegerLabels()
Returns the value of the flag that controls whether the String labels will be replaced with integers.

Returns:
the value of flag that controls whether the String labels will be replaced with integers.

setUseIntegerLabels

public void setUseIntegerLabels(boolean useIntLabels)
Sets the flag that controls whether the String labels will be replaced with integers.

Parameters:
useIntLabels - the value of flag that controls whether the String labels will be replaced with integers.

setWorkingDir

public void setWorkingDir(java.lang.String path)
Sets the working directory that the FileListWithClass objects will be written to.

Parameters:
path - the path to the working directory that the FileListWithClass objects will be written to.

getWorkingDir

public java.lang.String getWorkingDir()
Returns the working directory that the FileListWithClass objects will be written to.

Returns:
the path to the working directory that the FileListWithClass objects will be written to.

setTrainingProportion

public void setTrainingProportion(double TrainingProportion_)
Sets the proportion of the data to be used for the training set

Parameters:
TrainingProportion_ - the proportion of the data to be used for the training set

getTrainingProportion

public double getTrainingProportion()
Returns the proportion of the data to be used for the training set

Returns:
the proportion of the data to be used for the training set

isReady

public boolean isReady()
Controls whether the module is able to run based on the input flags and whether any class names have been exracted from FileListWithClass objects.

Returns:
a flag indicating whether the module is ready to run.

beginExecution

public void beginExecution()
Clears variable before each execution of an itinery that contains this module


doit

protected void doit()
             throws java.lang.Exception
Takes FileListWithClass objects divides them into a test and training set and writes the filenames and class metadata out to the specified file. Optionally replaces the String class labels with integers.

Throws:
java.lang.Exception - If an IO error occurs

getInputInfo

public java.lang.String getInputInfo(int param)
Returns a text description for the indicated input

Parameters:
param - the index of the input
Returns:
a text description of the indexed input

getInputTypes

public java.lang.String[] getInputTypes()
Returns an array of strings containing the Java data types of the input.

Returns:
the fully qualified java types for each of the inputs

getInputName

public java.lang.String getInputName(int i)
Returns a text name for the given input

Parameters:
i - the index of the input
Returns:
the name of the indexed input

getModuleInfo

public java.lang.String getModuleInfo()
Returns information about the module

Returns:
Module information

getOutputInfo

public java.lang.String getOutputInfo(int param)
Returns a text name for the given output

Parameters:
param - the index of the output
Returns:
the name of the indexed output

getOutputTypes

public java.lang.String[] getOutputTypes()
Returns an array of strings containing the Java data types of the outputs.

Returns:
the fully qualified java types for each of the outputs.

getOutputName

public java.lang.String getOutputName(int i)
Returns a text name for the given output

Parameters:
i - the index of the output
Returns:
the name of the indexed output