0 / 0
Explore graphs for drug treatment
Last updated: Dec 11, 2024
Explore graphs for drug treatment
This tutorial provides an example of how a medical researcher can compile and visual for a study. The medical examiner collected data about a set of patients, all of whom suffered from the same illness. During their course of treatment, each patient responded to one of five medications. Part of your job is to use data mining to find out which drug might be appropriate for a future patient with the same illness.

Try the tutorial

In this tutorial, you will complete these tasks:

Sample modeler flow and data set

This tutorial uses the Drug Treatment - Exploratory Graphs flow in the sample project. The data file used is drug1n.csv. The following image shows the sample modeler flow.

Figure 1. Sample modeler flow
Sample modeler flow

The data fields that are used in this example are:
Data field Description
Age Age of patient (number)
Sex M or F
BP Blood pressure: HIGH, NORMAL, or LOW
Cholesterol Blood cholesterol: NORMAL or HIGH
Na Blood sodium concentration
K Blood potassium concentration
Drug Prescription drug to which a patient responded

Task 1: Open the sample project

The sample project contains several data sets and sample modeler flows. If you don't already have the sample project, then refer to the Tutorials topic to create the sample project. Then follow these steps to open the sample project:

  1. In Cloud Pak for Data, from the Navigation menu Navigation menu, choose Projects > View all Projects.
  2. Click SPSS Modeler Project.
  3. Click the Assets tab to see the data sets and modeler flows.

Checkpoint icon Check your progress

The following image shows the project Assets tab. You are now ready to work with the sample modeler flow associated with this tutorial.

Sample project

Back to the top

Task 2: Examine the Data Asset

Drug Treatment - Exploratory Graphs includes several nodes. Follow these steps to examine the Data Asset node:

  1. From the Assets tab, open the Drug Treatment - Exploratory Graphs modeler flow, and wait for the canvas to load.
  2. Double-click the drug1n.csv node. This node is a Data Asset node that points to the drug1n.csv file in the project.
  3. Review the File format properties.
  4. Optional: Click Preview data to see the full data set.

Checkpoint icon Check your progress

The following image shows the Data Asset node. You are now ready to explore the distribution and data audit charts.

Data Asset

Back to the top

Task 3: Explore the distribution and data audit charts

During data mining, it is often useful to explore the data by creating visual summaries. SPSS Modeler offers many different types of charts to choose from, depending on the type of data you want to summarize. For example, to find out what proportion of the patients responded to each drug, explore a Drug type (Distribution) node. Follow these steps to explore some charts:

  1. Double-click the Drug type (Distribution) node to see its properties.
  2. Click Cancel.
  3. Hover over the Drug type (Distribution) node and click the Run icon .
  4. In the Outputs and models pane, click the Drug type output to view the results.
Figure 2. View Output: Drug type
View Output: Drug type

The chart helps you see the shape of the data. It shows that patients responded to drug Y most often and to drugs B and C least often.

Alternatively, you can attach and run a 7 Fields (Data Audit) node to see distributions and histograms for all fields at once.

  1. Double-click the 7 Fields (Data Audit) output node after the Data Asset node.
  2. Hover over the 7 Fields (Data Audit) node and click the Run icon .
  3. In the Outputs and models pane, click the 7 Fields (Data Audit) output to view the results.
Figure 3. View Output: Data Audit of [7 fields]
Data Audit chart

Checkpoint icon Check your progress

The following image shows the flow. You are now ready to create and explore the Scatter plot.

Modeler flow with Outputs and models pane displayed

Back to the top

Task 4: Create and explore the Scatter plot

You can see what factors might influence Drug, the target variable. As a researcher, you know that the concentrations of sodium and potassium in the blood are important factors. Since these concentrations are both numeric values, you can create a Scatter plot of sodium versus potassium that uses the drug categories as a color overlay. Follow these steps to create and explore the scatter plot:

  1. From the Graphs section in the palette, drag the Plot node onto the canvas.
  2. Hover over the node, click the Edit Title button, and rename it to Na v. K.
  3. Connect the Plot node to the drug1n.csv data asset node.
  4. Double-click the Na v. K (Plot) node to edit its properties.
  5. In the Plot section, select Na as the X field, K as the Y field, and in the Overlay section, select Drug as the Color field.
  6. Click Save.
  7. Hover over the Na v. K (Plot) node and click the Run icon .
  8. In the Outputs and models pane, click the Na v. K output to view the results.

The plot clearly shows a threshold. For values higher than the threshold, drug Y is always the correct drug. And for values less than the threshold, drug Y is never the correct drug. This threshold is the ratio of sodium (Na) to potassium (K).

Checkpoint icon Check your progress

The following image shows the scatter plot. You are now ready to create and explore the web chart.

Scatter plot of drug distribution

Back to the top

Task 5: Create and explore the web chart

Since many of the data fields are categorical, you can also try plotting a web chart, which maps associations between different categories. Follow these steps to explore a web chart:

  1. From the Graphs section in the palette, drag the Web node onto the canvas and connect it to the drug1n.csv data asset node.
  2. Double-click the Web node to edit its properties.
  3. In the Fields section, click Add columns. Select the BP (for blood pressure) and Drug columns.
  4. Click Save.
  5. Hover over the Web node and click the Run icon
  6. In the Outputs and models pane, click the Web output to view the results.

From the plot, apparently drug Y is associated with all three levels of blood pressure. This result is no surprise; you already determined the situation in which drug Y is best.

But if you ignore drug Y and focus on the other drugs, you can see that drugs A and B are also associated with high blood pressure. And drugs C and X are associated with low blood pressure. And normal blood pressure is associated with drug X. Though, you still don't know how to choose between drugs A and B or between drugs C and X, for a specific patient. Modeling can help in this case.

Checkpoint icon Check your progress

The following image shows the web plot. You are now ready to explore advanced visualizations.

Web graph of drugs vs. blood pressure

Back to the top

Task 6: Explore advanced visualizations

The previous sections use different types of graph nodes. Another way to explore data is with the advanced visualizations feature. Follow these steps to create and explore advanced charts:

  1. From the Graphs section in the palette, drag the Charts node onto the canvas and connect it to the drug1n.csv data asset node.
  2. Double-click the Charts node to see its properties.
  3. Click Launch Chart Builder button.

    Here you can choose and create advanced charts to explore your data from different perspectives and identify patterns, connections, and relationships within your data. Experiment with creating some charts before you return to the modeler flow.

Checkpoint icon Check your progress

The following image shows an example 3D chart. You are now ready to explore the Derive node.

Advanced visualizations

Back to the top

Task 7: Explore the Derive node

As you saw with the scatter plot from Task 4, the ratio of sodium to potassium seems to predict when to use drug Y. You can derive a field that contains the value of this ratio for each record. This field might be useful later when you build a model to predict when to use each of the five drugs.

Follow these steps to explore the Derive node:

  1. Double-click the Na_to_K (Derive) node to edit its properties.
  2. Look at the Expression section. Na/K is the expression because you obtain the new area by dividing the sodium value by the potassium value.

    You can also create an expression by clicking the calculator icon icon Run icon to open the Expression Builder; a way to interactively create expressions by using built-in lists of functions, operands, and fields and their values.
  3. Click Cancel to return to the properties, and click Cancel again to return to the flow.
  4. From the Graphs section in the palette, drag the Histogram node onto the canvas and connect it to the Na_to_K (Derive) node.
  5. Double-click the Histogram node to see its properties.
  6. In the Histogram node properties, specify Na_to_K as the field to be plotted and Drug as the color overlay field.
  7. Click Save.
  8. Hover over the Histogram node, and click the Run icon .
  9. In the Outputs and models pane, click the Histogram output to view the results.

Based on the chart, you can conclude that when the Na_to_K value is around 15 or more, drug Y is the drug of choice.

Checkpoint icon Check your progress

The following image shows the histogram. You are now ready to explore the Filter and Type nodes.

Histogram node

Back to the top

Task 8: Explore the Filter and Type nodes

By exploring and manipulating the data, you are able to form some hypotheses. The ratio of sodium to potassium in the blood seems to affect the choice of drug, as does blood pressure. But you cannot fully explain all of the relationships yet. Modeling can provide some answers. First, follow these steps to explore the Filter and Type nodes:

  1. Double-click the Discard Fields (Filter) node to see its properties.
  2. Since the derived field Na_to_K is used, the original fields Na and K are filter out, so they're not used twice in the modeling algorithm.

    Figure 4. Filter node properties
    Filter node properties
  3. Click Cancel.
  4. Double-click the Define Types (Type) node to see its properties.
  5. With the Type node, you can indicate the types of fields you're using and how they're used to predict the outcomes. Notice that the role for the Drug field is set to Target, indicating that Drug is the field you want to predict. The role for the other fields is set to Input so they are used as predictors.

    Figure 5. Type node properties
    Type node properties
  6. Click Cancel.

Checkpoint icon Check your progress

The following image shows the flow. You are now ready to generate the model.

Modeler flow with Outputs and models pane displayed

Back to the top

Task 9: Generate the model

Follow these steps to generate the model by using a C5.0 node:

  1. Hover over the Drug (C5.0) node and click the Run icon .
  2. In the Outputs and models pane, click the Drug model to view the results.

    The Tree Diagram displays the set of rules that are generated by the C5.0 node in a tree format. Now, you can see the missing pieces of the puzzle. For people with an Na-to-K ratio less than 14.829 and high blood pressure, age determines the choice of drug. For people with low blood pressure, cholesterol level seems to be the best predictor.

    You can hover over the nodes in the tree to see more details such as the number of cases for each blood pressure category and the confidence percentage of cases.

Checkpoint icon Check your progress

The following image shows the tree diagram. You are now ready to create an Analysis node.

Tree Diagram output

Back to the top

Task 10: Create an Analysis node

Follow these steps to assess the accuracy of the model by using an Analysis node:

  1. From the Outputs section in the palette, drag the Analysis node onto the canvas and connect it to the Drug (C5.0) model nugget.
  2. Hover over the Analysis node and click the Run icon
  3. In the Outputs and models pane, click the Analysis of [Drug] output to view the results.

    The Analysis node output shows that with this artificial dataset, the model correctly predicted the choice of drug for every record in the dataset. With a real dataset you are unlikely to see 100% accuracy, but you can use the Analysis node to help determine whether the model is acceptably accurate for your particular application.

Checkpoint icon Check your progress

The following image shows Analysis output.

Analysis output

Back to the top

Summary

This example showed you how to create and explore graphs for drug treatment and use them to find out which drug might be appropriate for a future patient with the same illness.

Next steps

You are now ready to try other SPSS® Modeler tutorials.

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more