GTN Tutorial: Data manipulation Olympics - all steps and exercises

data-science-data-manipulation-olympics/data-manipulation-olympics-with-exercises

Author(s)

version Version
0
last_modification Last updated
Nov 13, 2024
license License
None Specified, defaults to CC-BY-4.0
galaxy-tags Tags
introduction

Features

Tutorial
hands_on Data Manipulation Olympics

Workflow Testing
Tests: ❌
Results: Not yet automated
FAIRness purl PURL
https://gxy.io/GTN:
RO-Crate logo with flask Download Workflow RO-Crate Workflowhub cloud with gears logo View on WorkflowHub
Launch in Tutorial Mode question
galaxy-download Download
flowchart TD
  0["ℹ️ Input Dataset\nolympics.tsv"];
  style 0 stroke:#2c3143,stroke-width:4px;
  1["ℹ️ Input Dataset\ncountry-information.tsv"];
  style 1 stroke:#2c3143,stroke-width:4px;
  2["ℹ️ Input Dataset\nolympics_2022.tsv"];
  style 2 stroke:#2c3143,stroke-width:4px;
  3["Line/Word/Character count"];
  0 -->|output| 3;
  4["tabular-to-csv"];
  0 -->|output| 4;
  5["Sort"];
  0 -->|output| 5;
  6["Sort"];
  0 -->|output| 6;
  7["Sort"];
  0 -->|output| 7;
  8["Sort"];
  0 -->|output| 8;
  9["Sort"];
  0 -->|output| 9;
  10["Sort"];
  0 -->|output| 10;
  11["Filter"];
  0 -->|output| 11;
  12["Filter"];
  0 -->|output| 12;
  13["Count"];
  0 -->|output| 13;
  14["Count"];
  0 -->|output| 14;
  15["Count"];
  0 -->|output| 15;
  16["Count"];
  0 -->|output| 16;
  17["Datamash"];
  0 -->|output| 17;
  18["Count"];
  0 -->|output| 18;
  19["Datamash"];
  0 -->|output| 19;
  20["Filter"];
  0 -->|output| 20;
  21["Filter"];
  0 -->|output| 21;
  22["Filter"];
  0 -->|output| 22;
  23["Filter"];
  0 -->|output| 23;
  24["Filter"];
  0 -->|output| 24;
  25["Filter"];
  0 -->|output| 25;
  26["Filter"];
  0 -->|output| 26;
  27["Compute"];
  0 -->|output| 27;
  28["Compute"];
  0 -->|output| 28;
  29["Cut"];
  0 -->|output| 29;
  30["Column Regex Find And Replace"];
  0 -->|output| 30;
  31["Column Regex Find And Replace"];
  0 -->|output| 31;
  32["Column Regex Find And Replace"];
  0 -->|output| 32;
  33["Cut"];
  0 -->|output| 33;
  34["Cut"];
  0 -->|output| 34;
  35["Datamash"];
  0 -->|output| 35;
  36["Datamash"];
  0 -->|output| 36;
  37["Datamash"];
  0 -->|output| 37;
  38["Datamash"];
  0 -->|output| 38;
  39["Datamash"];
  0 -->|output| 39;
  40["Count"];
  0 -->|output| 40;
  41["Column Regex Find And Replace"];
  0 -->|output| 41;
  42["Split file"];
  0 -->|output| 42;
  43["Join two Datasets"];
  0 -->|output| 43;
  1 -->|output| 43;
  44["Remove beginning"];
  2 -->|output| 44;
  45["Sort"];
  11 -->|out_file1| 45;
  46["Filter"];
  12 -->|out_file1| 46;
  47["Cut"];
  28 -->|out_file1| 47;
  48["Unique"];
  29 -->|out_file1| 48;
  49["Count"];
  41 -->|out_file1| 49;
  50["Datamash"];
  41 -->|out_file1| 50;
  51["Concatenate datasets"];
  0 -->|output| 51;
  44 -->|out_file1| 51;
  52["Filter"];
  45 -->|outfile| 52;
  53["Sort"];
  46 -->|out_file1| 53;
  54["Remove beginning"];
  47 -->|out_file1| 54;
  55["Select first"];
  47 -->|out_file1| 55;
  56["Sort"];
  48 -->|outfile| 56;
  57["Unique"];
  54 -->|out_file1| 57;
  58["Concatenate datasets"];
  55 -->|out_file1| 58;
  57 -->|outfile| 58;
  59["Join two Datasets"];
  0 -->|output| 59;
  58 -->|out_file1| 59;
  60["Cut"];
  59 -->|out_file1| 60;
  61["Line/Word/Character count"];
  60 -->|out_file1| 61;

Inputs

Input Label
Input dataset olympics.tsv
Input dataset country-information.tsv
Input dataset olympics_2022.tsv

Outputs

From Output Label

Tools

Tool Links
Count1
Cut1
Filter1
Remove beginning1
Show beginning1
cat1
join1
tabular_to_csv
toolshed.g2.bx.psu.edu/repos/bgruening/split_file_on_column/tp_split_on_column/0.4 View in ToolShed
toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_cat/0.1.1 View in ToolShed
toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_sort_header_tool/1.1.1 View in ToolShed
toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_sorted_uniq/1.1.0 View in ToolShed
toolshed.g2.bx.psu.edu/repos/devteam/column_maker/Add_a_column1/1.6 View in ToolShed
toolshed.g2.bx.psu.edu/repos/galaxyp/regex_find_replace/regexColumn1/1.0.1 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/datamash_ops/datamash_ops/1.1.0 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/datamash_ops/datamash_ops/1.1.0+galaxy2 View in ToolShed
wc_gnu

To use these workflows in Galaxy you can either click the links to download the workflows, or you can right-click and copy the link to the workflow which can be used in the Galaxy form to import workflows.

Importing into Galaxy

Below are the instructions for importing these workflows directly into your Galaxy server of choice to start using them!
Hands-on: Importing a workflow
  • Click on Workflow on the top menu bar of Galaxy. You will see a list of all your workflows.
  • Click on galaxy-upload Import at the top-right of the screen
  • Provide your workflow
    • Option 1: Paste the URL of the workflow into the box labelled “Archived Workflow URL”
    • Option 2: Upload the workflow file in the box labelled “Archived Workflow File”
  • Click the Import workflow button

Below is a short video demonstrating how to import a workflow from GitHub using this procedure:

Video: Importing a workflow from URL

Version History

Version Commit Time Comments

For Admins

Installing the workflow tools

wget https://training.galaxyproject.org/training-material/topics/data-science/tutorials/data-manipulation-olympics/workflows/data_manipulation_olympics_with_exercises.ga -O workflow.ga
workflow-to-tools -w workflow.ga -o tools.yaml
shed-tools install -g GALAXY -a API_KEY -t tools.yaml
workflow-install -g GALAXY -a API_KEY -w workflow.ga --publish-workflows