Difference between revisions of "P3s"

From DUNE
Jump to navigation Jump to search
Line 1: Line 1:
'''!!!UNDER CONSTRUCTION!!!'''
 
 
=States of Pilots and Jobs=
 
=States of Pilots and Jobs=
 
==Pilots==
 
==Pilots==
 +
Pilots are created completely independently of the server and contact the server via HTTP once they are initiated. The server searches its database of "defined jobs", sorts the jobs by priority and sends a reference to a job to the pilot which sent the request. In the current version, the job reference contains the path to the executable, its parameters and also the part of the job environment which helps to reference the data both in the input and the output by using environment variables.
 +
 
States:
 
States:
 
* active
 
* active
Line 8: Line 9:
 
* finished (completion of a job)
 
* finished (completion of a job)
 
* stopped (timeout w/o getting a job)
 
* stopped (timeout w/o getting a job)
 +
 +
Status:
 +
* OK
 +
* FAIL
  
 
==Jobs==
 
==Jobs==
 
States:
 
States:
 +
* template
 
* defined
 
* defined
* dispatched (sent to a pilot)
+
* dispatched (sent to a pilot for execution)
 
* running
 
* running
 
* finished
 
* finished
 +
 +
Events:
 +
* jobstart
 +
* jobstop
 +
 +
==Matching jobs to pilots==
 +
Jobs are created in the "template" state and won't be matched
  
 
=DAG=
 
=DAG=
==DAG Model in p3s==
+
==DAG as a template==
 +
* DAG describes the topology and general properties of a workflow, and serves as a template for workflows. The system stores multiple templates referred to by their names.
 +
 
 
* Vertex and Edge tables: vertices are jobs and the edges are data. The edge class has the following attributes at a minimum
 
* Vertex and Edge tables: vertices are jobs and the edges are data. The edge class has the following attributes at a minimum
 
** ID
 
** ID
Line 24: Line 39:
  
 
* Leaves of a DAG: can only be a job, not data (since all data are edges and not vertices). This also has the benefit of not having final data unaccounted for - it must me either flushed or moved to permanent storage in most cases. The job/task responsible for either of these operations forms the leaf.
 
* Leaves of a DAG: can only be a job, not data (since all data are edges and not vertices). This also has the benefit of not having final data unaccounted for - it must me either flushed or moved to permanent storage in most cases. The job/task responsible for either of these operations forms the leaf.
 
==DAG as a state machine in p3s==
 
A design idea:
 
* have an attribute in the Job class which specifies whether children of that particular node have been created (can be one child, of course)
 
* this assumes that the data which is input to each child have been created and is already available
 
* special (but trivial) case is the leaf of a DAG - no further jobs need to be generated. Finding whether a node is a leaf can be done by looking for edges with "source" attribute corresponding to this node - the leaf condition is ascertained when there are no such edges.
 

Revision as of 12:08, 10 January 2017

States of Pilots and Jobs

Pilots

Pilots are created completely independently of the server and contact the server via HTTP once they are initiated. The server searches its database of "defined jobs", sorts the jobs by priority and sends a reference to a job to the pilot which sent the request. In the current version, the job reference contains the path to the executable, its parameters and also the part of the job environment which helps to reference the data both in the input and the output by using environment variables.

States:

  • active
  • dispatched
  • running
  • finished (completion of a job)
  • stopped (timeout w/o getting a job)

Status:

  • OK
  • FAIL

Jobs

States:

  • template
  • defined
  • dispatched (sent to a pilot for execution)
  • running
  • finished

Events:

  • jobstart
  • jobstop

Matching jobs to pilots

Jobs are created in the "template" state and won't be matched

DAG

DAG as a template

  • DAG describes the topology and general properties of a workflow, and serves as a template for workflows. The system stores multiple templates referred to by their names.
  • Vertex and Edge tables: vertices are jobs and the edges are data. The edge class has the following attributes at a minimum
    • ID
    • Source node
    • Target node
  • Leaves of a DAG: can only be a job, not data (since all data are edges and not vertices). This also has the benefit of not having final data unaccounted for - it must me either flushed or moved to permanent storage in most cases. The job/task responsible for either of these operations forms the leaf.