Difference between revisions of "Metadata Broker"

From Gcube Wiki
Jump to: navigation, search
(Usage Example)
 
(28 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
[[Category:TO BE REMOVED]]
 +
 +
[[Image:Alert_icon2.gif]] ''THIS SECTION OF GCUBE DOCUMENTATION IS CURRENTLY UNDER UPDATE.''
 +
 
== Metadata Broker ==
 
== Metadata Broker ==
 
=== Introduction ===
 
=== Introduction ===
Line 9: Line 13:
  
 
Each transformation program consists of:
 
Each transformation program consists of:
* One or more input definitions. Each one may be a:
+
* One or more data input definitions. Each one defines the schema, language and type (record, ResultSet or collection) of the data that must be mapped to the particular input.
** Data input: accepts a reference to a ResultSet, Collection or Record to be transformed. It also contains the schema-id and language-id that the input data should have.
+
* One or more input variables. Each one of them is placeholder for an additional string value which must be passed to the transformation program at run-time.
** Input variable: a placeholder for an additional string value which must be passed to the transformation program at run-time.  
+
* Exactly one data output definition, which contains the output data type (record, ResultSet or collection), schema and language.
* Exactly one output definition, which contains the output data type (Record, ResultSet or collection), schema and language.
+
 
* One or more transformation rule definitions.
 
* One or more transformation rule definitions.
  
Note: The name of the input or output schema must be given in the format '''NAME=URI''', where NAME is the name of the schema and URI is the URI of its definition, e.g. '''<nowiki>DC=http://dublincore.org/schemas/xmls/simpledc20021212.xsd</nowiki>'''.
+
'''Note''': The name of the input or output schema must be given in the format '''''SchemaName=SchemaURI''''', where SchemaName is the name of the schema and SchemaURI is the URI of its definition, e.g. '''<nowiki>DC=http://dublincore.org/schemas/xmls/simpledc20021212.xsd</nowiki>'''.
  
 
==== Transformation Rules ====
 
==== Transformation Rules ====
Line 22: Line 25:
  
 
Each transformation rule consists of:
 
Each transformation rule consists of:
* One or more inputs definitions. Each definition contains the schema, language, type (record, ResultSet, collection or variable)and data reference of the input it describes. Each one of these elements (except for the 'type' element) can be either a literal value, or a reference to another value defined inside the transformation program (using XPath syntax).
+
* One or more data input definitions. Each definition contains the schema, language, type (record, ResultSet, collection or input variable) and data reference of the input it describes. Each one of these elements (except for the 'type' element) can be either a literal value, or a reference to another value defined inside the transformation program (using XPath syntax).
* Exactly one output, which can be:
+
* Exactly one data output, which can be:
** A definition that contains the output data type (Record, ResultSet or collection), schema and language.
+
** A definition that contains the output data type (record, ResultSet or collection), schema and language.
 
** A reference to the transformation program‘s output (using XPath syntax). This is the way to express that the output of this transformation rule will also be the output of the whole transformation program, so such a reference is only valid for the transformation program‘s final rule.
 
** A reference to the transformation program‘s output (using XPath syntax). This is the way to express that the output of this transformation rule will also be the output of the whole transformation program, so such a reference is only valid for the transformation program‘s final rule.
* The name of the underlying [[Metadata Broker#Programs|program]] to execute in order to do the transformation.
+
* The name of the underlying [[Metadata Broker#Programs|program]] to execute in order to do the transformation, using standard '<tt>packageName.className</tt>' syntax.
 +
 
 +
A transformation rule can also be a reference to another transformation program. This way, whole transformation programs can be used as parts of the execution of another transformation program. The reference can me made using the unique id of the transformation program being referenced and a set of value assignments to its data inputs and variables.
 +
 
 +
'''Note''': The name of the input or output schema must be given in the format '''''SchemaName=SchemaURI''''', where SchemaName is the name of the schema and SchemaURI is the URI of its definition, e.g. '''<nowiki>DC=http://dublincore.org/schemas/xmls/simpledc20021212.xsd</nowiki>'''.
 +
 
 +
==== Variable fields inside data input/output definitions ====
 +
 
 +
Inside the definition of data inputs and outputs of transformation programs and transformation rules, any field except for 'Type' can be declared as a variable field. Just like inputs variables, variable fields get their values by run-time assignments. In order to declare an element as a variable field of its parent element, one needs to include '<tt>isVariable=true</tt>' in the element's definition. When the caller invokes a broker operation in order to transform some metadata, he/she can provide a set of value assignments to the input variables and variable fields of the transformation program definition. But the caller has access only to the variables of the whole transformation program, not the internal transformation rules. However, transformation rules can also contain variable fields in their input/output definitions. Since the caller cannot explicitly assign values to them, such variable fields must contain an XPath expression as their value, which points to another element inside the transformation program that contains the value to be assigned. These references are resolved when each transformation rule is executed, so if, for example, a variable field of a transformation rule's input definition points to a variable field of the previous transformation rule's output definition, it is guaranteed that the referenced element's value will be there at the time of execution of the second transformation rule. It is important to note that every XPath expression should specify an absolute location inside the document, which basically means it should start with '/'.
  
Note: The name of the input or output schema must be given in the format '''NAME=URI''', where NAME is the name of the schema and URI is the URI of its definition, e.g. '''<nowiki>DC=http://dublincore.org/schemas/xmls/simpledc20021212.xsd</nowiki>'''.
+
There is a special case where the <tt>language</tt> and <tt>schema</tt> fields of a transformation program's data input definition can be automatically get values assigned to them, without requiring the caller to do so. This can happen when the type of the particular data input is set to <tt>collection</tt>. In this case, the Metadata Broker Service automatically retrieves the format of the metadata collection described by the ID that is given through the <tt>Reference</tt> field of the data input definition and assigns the actual schema descriptor and language identifier of the collection to the respective variable fields of the data input definition. If any of these fields already contain values, these values are compared with the ones retrieved from the metadata collection's profile, and if they are different the execution of the transformation program stops and an exception is thrown by the Metadata Broker service. Note that the automatic value assignment works only on data inputs of transformation programs and NOT on data inputs of individual transformation rules.
  
 
==== Programs ====
 
==== Programs ====
Line 60: Line 71:
 
** <tt>transform(TransformationProgramID, params) -> String</tt><br>This operation takes the DiligentID of a transformation program stored in the DIS and a set of transformation parameters. The referenced transformation program is executed using the provided parameters, which are just a set of value assignments to variables defined inside the transformation program. The metadata broker library contains a helper class for creating such a parameter set.
 
** <tt>transform(TransformationProgramID, params) -> String</tt><br>This operation takes the DiligentID of a transformation program stored in the DIS and a set of transformation parameters. The referenced transformation program is executed using the provided parameters, which are just a set of value assignments to variables defined inside the transformation program. The metadata broker library contains a helper class for creating such a parameter set.
 
** <tt>transformWithNewTP(TransformationProgram, params) -> String</tt><br>This operation offers the same functionality as the previous one. However, in this case the first parameter is the full XML definition of a transformation program in string format and not the DiligentID of a stored one.
 
** <tt>transformWithNewTP(TransformationProgram, params) -> String</tt><br>This operation offers the same functionality as the previous one. However, in this case the first parameter is the full XML definition of a transformation program in string format and not the DiligentID of a stored one.
** <tt>findPossibleTransformationPrograms (InputDesc, OutputDesc) -> TransformationProgram[]</tt><br>This operation takes the description of some input format (type, language and schema) as well as the description of a desired output format, and returns an array of transformation programs definitions that could be used in order to perform the required conversion. These transformation programs may not exist before invoking this operation. They are produced on the fly, by combining all the existing transformation programs which are compatible with each other, trying to synthesize more complex transformation programs. Of course, if there is already an existing transformation program which is applicable for the requested type of transformation, it is included in the results.<br><br>
+
** <tt>findPossibleTransformationPrograms (InputDesc, OutputDesc) -> TransformationProgram[]</tt><br>This operation takes the description of some input format (type, language and schema) as well as the description of a desired output format, and returns an array of transformation programs definitions that could be used in order to perform the required conversion. These transformation programs may not exist before invoking this operation. They are produced on the fly, by combining all the existing transformation programs which are compatible with each other, trying to synthesize more complex transformation programs. Of course, if there is already an existing transformation program which is applicable for the requested type of transformation, it is included in the results. If the output format is null, then the returned array contain all transformation programs that can be applied to the specified input format, producing any possible output format.<br><br>
  
 
* '''The metadata broker library'''<br>The metadata broker library contains the definitions of the RecordType, CollectionType, ResultSetType and VariableType Java classes, as well as the definition of the Program Java interface. The following programs are also included in it:
 
* '''The metadata broker library'''<br>The metadata broker library contains the definitions of the RecordType, CollectionType, ResultSetType and VariableType Java classes, as well as the definition of the Program Java interface. The following programs are also included in it:
Line 87: Line 98:
 
** Metadata catalog library
 
** Metadata catalog library
  
=== Usage Example ===
+
=== Usage Examples ===
  
The following examples use the <tt>transform</tt> operation of the metadata broker service.
+
The following examples show how some of the transformation programs contained in the metadata broker library can be used. For this purpose, the client-side code is shown, describing the necessary steps to invoke the operations of the metadata broker service. Furthermore, the full definition of the referenced programs and transformation programs is also given. These definitions can be used as the base for creating new programs and transformation programs by anyone who needs to do this.
  
 
==== Transforming a single record using a XSLT ====
 
==== Transforming a single record using a XSLT ====
  
This is the GXSLT_Rec2Rec class (included in the metadata broker library), which performs the actual conversion:
+
This is the <tt>GXSLT_Rec2Rec</tt> class (included in the metadata broker library), which performs the actual conversion:
  
 
<pre>
 
<pre>
Line 102: Line 113:
 
import org.diligentproject.metadatamanagement.metadatabrokerlibrary.programs.RecordType;
 
import org.diligentproject.metadatamanagement.metadatabrokerlibrary.programs.RecordType;
 
import org.diligentproject.metadatamanagement.metadatabrokerlibrary.programs.VariableType;
 
import org.diligentproject.metadatamanagement.metadatabrokerlibrary.programs.VariableType;
 +
import org.diligentproject.metadatamanagement.metadatabrokerlibrary.util.GenericResourceRetriever;
  
 
import java.io.StringReader;
 
import java.io.StringReader;
Line 123: Line 135:
 
public void transform(RecordType record, VariableType xslt, RecordType outRecord) throws RemoteException {
 
public void transform(RecordType record, VariableType xslt, RecordType outRecord) throws RemoteException {
 
try {
 
try {
            Transformer t = factory.newTransformer(new StreamSource(new StringReader(xslt.getReference())));
+
                    String xsltdef = GenericResourceRetriever.retrieveGenericResource(xslt.getReference());
 +
            Transformer t = factory.newTransformer(new StreamSource(new StringReader(xsltdef)));
 
             t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
 
             t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
 
            output = new StringWriter();
 
            output = new StringWriter();
Line 197: Line 210:
 
</pre>
 
</pre>
  
In this example, the transformation program defined above is stored in the DIS as a profile with UniqueID=<tt>ce6b9860-ebfe-11db-8b69-dd428ed9686d</tt>. The input record that is going to be transformed is stored in a local file named <tt>input.xml</tt>, and the XSLT that will be used is defined by the file <tt>xslt.xml</tt>. The following code fragment reads the contents of these two files, creates a set of parameters which are used in order to assign the input data and the XSLT definition to the respective transformation program variable inputs, and then invokes the <tt>transform</tt> operation of the metadata broker service. The result is written to the console. The URI of the remote service is given as a command-line argument.
+
In this example, the transformation program defined above is stored in the DIS as a profile with UniqueID=<tt>ce6b9860-ebfe-11db-8b69-dd428ed9686d</tt>. The input record that is going to be transformed is stored in a local file named <tt>input.xml</tt>, and the XSLT that will be used is stored as a generic resource with UniqueID=<tt>ed358e00-23f2-11dc-a35f-9c01d805f283</tt> in the DIS. The following code fragment reads the input record from the file, creates a set of parameters which are used in order to assign the input data and the XSLT ID to the respective transformation program variable inputs, and then invokes the <tt>transform</tt> operation of the metadata broker service. The result is written to the console. The URI of the remote service is given as a command-line argument.
  
 
<pre>
 
<pre>
Line 210: Line 223:
 
// Read the input data file into a string
 
// Read the input data file into a string
 
String inputData = readTextFile("input.xml");
 
String inputData = readTextFile("input.xml");
 
// Read the XSLT file into a string
 
String XSLTDefinition = readTextFile("xslt.xml");
 
 
 
 
// Create a set of transformation parameters, assigning values to variables
 
// Create a set of transformation parameters, assigning values to variables
Line 222: Line 232:
 
tparams.addParameter("//Output[@name='TPOutput']/Schema", "Schema2=URI2");
 
tparams.addParameter("//Output[@name='TPOutput']/Schema", "Schema2=URI2");
 
tparams.addParameter("//Output[@name='TPOutput']/Language", "en");
 
tparams.addParameter("//Output[@name='TPOutput']/Language", "en");
tparams.addParameter("//Variable[@name='XSLT']", XSLTDefinition);
+
tparams.addParameter("//Variable[@name='XSLT']", "ed358e00-23f2-11dc-a35f-9c01d805f283");
  
 
// Prepare the invocation parameters
 
// Prepare the invocation parameters
Line 250: Line 260:
  
 
==== Transforming an entire ResultSet using a XSLT ====
 
==== Transforming an entire ResultSet using a XSLT ====
 +
 +
This is the definition of the <tt>GXSLT_RS2RS</tt> class (included in the metadata broker library), which performs the actual conversion:
 +
 +
<pre>
 +
package org.diligentproject.metadatamanagement.metadatabrokerlibrary.programs.GXSLT_RS2RS;
 +
 +
import java.rmi.RemoteException;
 +
 +
import org.apache.log4j.Logger;
 +
import org.diligentproject.metadatamanagement.metadatabrokerlibrary.programs.Program;
 +
import org.diligentproject.metadatamanagement.metadatabrokerlibrary.programs.ResultSetType;
 +
import org.diligentproject.metadatamanagement.metadatabrokerlibrary.programs.VariableType;
 +
import org.diligentproject.metadatamanagement.metadatabrokerlibrary.programs.GXSLT_RS2RS.GXSLT_RS2RS_Worker;
 +
import org.diligentproject.searchservice.searchlibrary.rsclient.elements.RSResourceWSRFType;
 +
import org.diligentproject.searchservice.searchlibrary.rswriter.RSXMLWriter;
 +
 +
public class GXSLT_RS2RS implements Program {
 +
private static Logger log = Logger.getLogger(GXSLT_RS2RS.class);
 +
private String output = null;
 +
 +
public void transform(ResultSetType RS, VariableType xslt, ResultSetType outRS) throws RemoteException {
 +
try {
 +
RSXMLWriter writer = RSXMLWriter.getRSXMLWriter();
 +
new GXSLT_RS2RS_Worker(RS, writer, xslt).start();
 +
output = writer.getRSLocator(new RSResourceWSRFType()).getLocator();
 +
} catch (Exception e) {
 +
log.error("GXSLT_RS2RS: Failed to create writer for output resultset.", e);
 +
throw new RemoteException("GXSLT_RS2RS: Failed to create writer for output resultset.", e);
 +
}
 +
}
 +
 +
public String getOutput() {
 +
return this.output;
 +
}
 +
 +
}
 +
</pre>
 +
 +
As stated before, bulk transformations are non-blocking. For this reason, the above code spawns a new thread to handle the transformation process. The definition of the <tt>GXSLT_RS2RS_Worker</tt> class (which extends the <tt>Thread</tt> class) follows.
 +
 +
<pre>
 +
package org.diligentproject.metadatamanagement.metadatabrokerlibrary.programs.GXSLT_RS2RS;
 +
 +
import javax.xml.transform.Templates;
 +
import org.apache.log4j.Logger;
 +
import org.diligentproject.metadatamanagement.metadatabrokerlibrary.programs.ResultSetType;
 +
import org.diligentproject.metadatamanagement.metadatabrokerlibrary.programs.VariableType;
 +
import org.diligentproject.metadatamanagement.metadatabrokerlibrary.programs.GXSLT_Rec2Rec.GXSLT_Rec2Rec;
 +
import org.diligentproject.searchservice.searchlibrary.resultset.elements.ResultElementGeneric;
 +
import org.diligentproject.searchservice.searchlibrary.rsclient.elements.RSLocator;
 +
import org.diligentproject.searchservice.searchlibrary.rsclient.elements.RSResourceLocalType;
 +
import org.diligentproject.searchservice.searchlibrary.rsreader.RSXMLIterator;
 +
import org.diligentproject.searchservice.searchlibrary.rsreader.RSXMLReader;
 +
import org.diligentproject.searchservice.searchlibrary.rswriter.RSXMLWriter;
 +
 +
class GXSLT_RS2RS_Worker extends Thread {
 +
private static Logger log = Logger.getLogger(GXSLT_RS2RS_Worker.class);
 +
private ResultSetType RS;
 +
private RSXMLWriter writer;
 +
private VariableType xslt;
 +
 +
public GXSLT_RS2RS_Worker(ResultSetType resultSet, RSXMLWriter RSWriter, VariableType xsltToUse) {
 +
RS = resultSet;
 +
writer = RSWriter;
 +
xslt = xsltToUse;
 +
}
 +
 +
public void run() {
 +
String element = null;
 +
int i = 0;
 +
 +
try {
 +
/* Compile the XSLT so that the records in the resultset will be transformed faster */
 +
                        String xsltdef = GenericResourceRetriever.retrieveGenericResource(xslt.getReference());
 +
Templates compiledXSLT = GXSLT_Rec2Rec.compileXSLT(xsltdef);
 +
 +
/* Read each record of the input ResultSet, use the GXSLT_Rec2Rec program in order to transform
 +
* it and add it to the output ResultSet. */
 +
GXSLT_Rec2Rec GXSLTRecProgram = new GXSLT_Rec2Rec();
 +
RSXMLReader reader = RSXMLReader.getRSXMLReader(new RSLocator(RS.toString()));
 +
RSXMLIterator iter = reader.makeLocalPatiently(new RSResourceLocalType(), 1200000).getRSIterator(1200000);
 +
while(iter.hasNext()) {
 +
if (writer.isTimerAlive())
 +
writer.resetTimer();
 +
ResultElementGeneric elem = (ResultElementGeneric)iter.next(ResultElementGeneric.class);
 +
if (elem == null)
 +
continue;
 +
element = elem.getPayload();
 +
GXSLTRecProgram.transform(element, compiledXSLT);
 +
writer.addResults(new ResultElementGeneric(elem.getRecordAttributes(ResultElementGeneric.RECORD_ID_NAME)[0].getAttrValue(),
 +
elem.getRecordAttributes(ResultElementGeneric.RECORD_COLLECTION_NAME)[0].getAttrValue(),
 +
GXSLTRecProgram.getOutput()));
 +
i++;
 +
element = null;
 +
}
 +
writer.close();
 +
} catch (Exception e) {
 +
if (element != null)
 +
i++;
 +
log.error("GXSLT_RS2RS: Failed to transform the given resultset. Stopped at element " + String.valueOf(i) + ":\n" + element, e);
 +
e.printStackTrace();
 +
try {
 +
writer.close();
 +
} catch (Exception e1) {
 +
log.error("GXSLT_RS2RS: Failed to close resultset.");
 +
}
 +
}
 +
}
 +
}
 +
</pre>
 +
 +
The above code uses the <tt>GXSLT_Rec2Rec</tt> program to compile the XSLT so that the transformation executes as fast as possible. Then it iterates over the whole set of elements contained in the ResultSet, transforming each one using the compiled XSLT. Each transformed element is then added to the output ResultSet.
  
 
The following is the XML definition of the transformation program used for this type of transformation.
 
The following is the XML definition of the transformation program used for this type of transformation.
Line 290: Line 412:
 
</pre>
 
</pre>
  
In this example, the transformation program defined above is stored in the DIS as a profile with UniqueID=<tt>eb46fc40-ebfe-11db-8b6b-dd428ed9686d</tt>. The EPR of the input ResultSet that is going to be transformed is stored in a local file named <tt>input.xml</tt>, and the XSLT that will be used is defined by the file <tt>xslt.xml</tt>. The URI of the remote service is given as a command-line argument. The client code that invokes the broker service and performs the transformation is the same as in the previous example. The only thing that changes is the ID of the transformation program that is called, which should be set to <tt>eb46fc40-ebfe-11db-8b6b-dd428ed9686d</tt>.
+
In this example, the transformation program defined above is stored in the DIS as a profile with UniqueID=<tt>eb46fc40-ebfe-11db-8b6b-dd428ed9686d</tt>. The EPR of the input ResultSet that is going to be transformed is stored in a local file named <tt>input.xml</tt>, and the XSLT is the same used in the previous example. The URI of the remote service is given as a command-line argument. The client code that invokes the broker service and performs the transformation is the same as in the previous example. The only thing that changes is the ID of the transformation program that is called, which should be set to <tt>eb46fc40-ebfe-11db-8b6b-dd428ed9686d</tt>.
 +
 
 +
==== Using a transformation program within another transformation program ====
 +
 
 +
As stated before, whole transformation programs can be used as 'black-box' components inside another transformation program. This can be done by defining a transformation rule which describes the call to the second transformation program.
 +
 
 +
The transformation program that will be called from another transformation program in this example is defined below.
 +
 
 +
<pre>
 +
<TransformationProgram>
 +
    <Input name="TPInput">
 +
        <Schema>SCH1=http://schema1.xsd</Schema>
 +
        <Language>en</Language>
 +
        <Type>resultset</Type>
 +
        <Reference isVariable="true" />
 +
    </Input>
 +
    <Output name="TPOutput">
 +
        <Schema>SCH3=http://schema3.xsd</Schema>
 +
        <Language>en</Language>
 +
        <Type>resultset</Type>
 +
    </Output>
 +
    <TransformationRule>
 +
        <Definition>
 +
            <Transformer>org.diligentproject.program2</Transformer>
 +
            <Input name="Rule2Input">
 +
                <Schema isVariable="true">//Output[@name='TPRule1Output']/Definition/Schema</Schema>
 +
                <Language isVariable="true">//Output[@name='TPRule1Output']/Definition/Language</Language>
 +
                <Type>resultset</Type>
 +
                <Reference isVariable="true">//Output[@name='TPRule1Output']/Definition/Reference</Reference>
 +
            </Input>
 +
            <Output name="Rule2Output">
 +
                <Reference>//Output[@name='TPOutput']</Reference>
 +
            </Output>
 +
        </Definition>
 +
    </TransformationRule>
 +
</TransformationProgram>
 +
</pre>
 +
 
 +
The input and output schemas and languages are predefined inside this transformation program, so the only thing that should be specified at run-time is the actual input data reference. Let's say that this transformation program is stored in the DIS and its UniqueID is <tt>910e0710-f251-11db-88f9-f971eaf0d653</tt>.
 +
 
 +
The transformation program that uses the above transformation program is defined below.
 +
 
 +
<pre>
 +
<TransformationProgram>
 +
    <Input name="TPInput">
 +
        <Schema>SCH1=http://schema1.xsd</Schema>
 +
        <Language>en</Language>
 +
        <Type>resultset</Type>
 +
        <Reference isVariable="true" />
 +
    </Input>
 +
    <Variable name="var1"/>
 +
    <Output name="TPOutput">
 +
        <Schema>SCH2=http://schema2.xsd</Schema>
 +
        <Language>en</Language>
 +
        <Type>resultset</Type>
 +
    </Output>
 +
    <TransformationRule>
 +
        <Reference>
 +
            <Program>910e0710-f251-11db-88f9-f971eaf0d653</Program>
 +
            <Value isVariable="true" target="//Input[@name='TPInput']/Reference">//Input[@name='TPInput']/Reference</Value>
 +
            <Output name="Rule1Output" />
 +
        </Reference>
 +
    </TransformationRule>
 +
    <TransformationRule>
 +
        <Definition>
 +
            <Transformer>org.diligentproject.program1</Transformer>
 +
            <Input name="Rule2Input1">
 +
                <Schema isVariable="true">//Output[@name='TPRule1Output']/Definition/Schema</Schema>
 +
                <Language isVariable="true">//Output[@name='TPRule1Output']/Definition/Language</Language>
 +
                <Type>resultset</Type>
 +
                <Reference isVariable="true">//Output[@name='TPRule1Output']/Definition/Reference</Reference>
 +
            </Input>
 +
            <Input name="Rule2Input2">
 +
                <Schema />
 +
                <Language />
 +
                <Type>variable</Type>
 +
                <Reference isVariable="true"> //Variable[@name='var1'] </Reference>
 +
            </Input>
 +
            <Output name="Rule2Output">
 +
                <Reference>//Output[@name='TPOutput']</Reference>
 +
            </Output>
 +
        </Definition>
 +
    </TransformationRule>
 +
</TransformationProgram>
 +
</pre>
 +
 
 +
The element that describes the call to the first transformation program is the first <tt>TransformationRule</tt> element. This element specifies the UniqueID of the transformation program to be called, as well as a mapping of values to the variable inputs of that transformation program. Since the first transformation program contains only one variable input (the input data reference), there is only one mapping in this example, described by a <tt>Value</tt> element. The <tt>target</tt> attribute of this element specifies the target element of the other transformation program whose value is to be set, and the element's content specifies the actual value to set. In this example, this is not a literal value but a reference to another element of the transformation program, where the value should be taken from. Specifically, we have specified that the first transformation program's input should be the input of the second transformation program. Since the value of the <tt>Value</tt> element is a XPath expression, the <tt>isVariable</tt> attribute is also set to <tt>true</tt>, meaning that the content should be interpreted as a reference to another element and not as a literal value.
 +
The output of the first transformation program becomes the output of the transformation rule that called it, and is named <tt>Rule1Output</tt>. This output is then used as the input of the next transformation rule.
 +
 
 +
==== Finding a set of transformation programs given a source and target metadata formats ====
 +
 
 +
This example demonstrates how one can get an array of transformation programs that could be used in order to transform metadata from a given source format to a given target format. The operation that can be used in order to accomplish this is '<tt>findPossibleTransformationPrograms</tt>'. The caller must specify a source and target metadata format and the service searches for possible "chains" of existing transformation programs that could be used in order to carry out the transformation. There are three rules imposed by the Metadata Broker service:
 +
* Only transformation programs with one data input are considered during the search
 +
* Each transformation program can be used at most one time inside each chain of transformation programs (this is needed in order to avoid infinite loops)
 +
* A transformation program that produces a collection as its output can only be the last one inside a chain of transformation programs
 +
 
 +
Each chain composed by the Metadata Broker service is converted to a transformation program, which "links" the individual transformation programs forming the chain. This transformation program contains a transformation rule for each transformation program in the chain. Each transformation rule describes a call to the corresponding transformation program. The result of the operation is an array of strings, where each string corresponds to a synthesized transformation program.
 +
 
 +
It is possible that some of the transformation programs included in a chain contain some input variables. For each found variable, the Metadata Broker service places a variable to the synthesized transformation program, and this variable is mapped to the original one. This way one can specify the values of the variables contained in every transformation program involved in the chain, by specifying the values of the corresponding variables of the synthesized transformation program. This mechanism is necessary because the individual transformation programs contained in the chain are not visible to the caller. The only entity that the caller sees is the synthesized transformation program that is responsible for calling the ones it is built from.
 +
 
 +
Consider the case where a transformation program whose output language is a variable is added to a chain. When the service searches for another transformation program to append to the chain after that one, it may find a transformation program whose input language is 'en' (English). Then, the value 'en' will be assigned to the variable field describing the previous transformation program's output language. The same happens if an output field (schema or language) of a transformation program contains a specific value and the corresponding input field of the next transformation program is a variable. But what happens if the two fields are both variables? In this case, an input variable is added to the synthesized transformation program. When the caller uses this transformation program, he/she will need to specify a value for this variable. That value will then be assigned automatically both to the output field of the first transformation program and to the input field of the second transformation program.
 +
 
 +
Now let's see how one can call the '<tt>findPossibleTransformationPrograms</tt>' operation:
 +
 
 +
<pre>
 +
import org.apache.axis.message.addressing.Address;
 +
import org.apache.axis.message.addressing.EndpointReferenceType;
 +
import org.diligentproject.metadatamanagement.metadatabrokerlibrary.programs.TPIOType;
 +
import org.diligentproject.metadatamanagement.metadatabrokerservice.stubs.FindPossibleTransformationProgramsResponse;
 +
import org.diligentproject.metadatamanagement.metadatabrokerservice.stubs.MetadataBrokerPortType;
 +
import org.diligentproject.metadatamanagement.metadatabrokerservice.stubs.FindPossibleTransformationPrograms;
 +
import org.diligentproject.metadatamanagement.metadatabrokerservice.stubs.service.MetadataBrokerServiceAddressingLocator;
 +
 
 +
public class TestFindPossibleTPs {
 +
 +
public static void main(String[] args) {
 +
try {
 +
// Create endpoint reference to the service
 +
EndpointReferenceType endpoint = new EndpointReferenceType();
 +
endpoint.setAddress(new Address(args[0]));
 +
MetadataBrokerPortType broker = new MetadataBrokerServiceAddressingLocator().getMetadataBrokerPortTypePort(endpoint);
 +
 +
// Create the IO format descriptors
 +
TPIOType inFormat = TPIOType.fromParams(args[1], args[2], args[3], "");
 +
TPIOType outFormat = TPIOType.fromParams(args[4], args[5], args[6], "");
 +
 +
// Prepare the invocation parameters
 +
FindPossibleTransformationPrograms params = new FindPossibleTransformationPrograms();
 +
params.setInputFormat(inFormat.toXMLString());
 +
params.setOutputFormat(outFormat.toXMLString());
 +
 +
// Invoke the remote operation
 +
FindPossibleTransformationProgramsResponse resp = broker.findPossibleTransformationPrograms(params);
 +
String[] TPs = resp.getTransformationProgram();
 +
for (String TP : TPs) {
 +
System.out.println(TP);
 +
System.out.println();
 +
}
 +
 +
} catch (Exception e) {
 +
e.printStackTrace();
 +
}
 +
}
 +
}
 +
</pre>
 +
 
 +
This code fragment assumes the following:
 +
* args[0] = the Metadata Broker service URI
 +
* args[1] = the source format type (''''''resultset'''''', ''''''collection'''''' or ''''''record'''''')
 +
* args[2] = the source format language
 +
* args[3] = the source format schema (in ''''''schemaName=schemaURI'''''' format)
 +
* args[4] = the target format type (''''''resultset'''''', ''''''collection'''''' or ''''''record'''''')
 +
* args[5] = the target format language
 +
* args[6] = the target format schema (in ''''''schemaName=schemaURI'''''' format)
 +
 
 +
<br>First, an endpoint reference to the metadata broker service is created. Then, we have to create the source and target format descriptors. The remote operation accepts two strings describing the two metadata formats. These strings are nothing more that the serialized form of two '''''TPIOType''''' objects. The ''TPIOType'' class is the base class of the ''CollectionType'', ''ResultSetType'' and ''RecordType'' classes. This class defines the static method '''''fromParams''''' which creates and returns an object describing a metadata format based on given values for the format's schema, language, type and data reference. The returned object will be an instance of the correct class (derived from TPIOType), based on the given value for the 'type' attribute. Here, the 'reference' attribute is not used because we are interested in the metadata format itself and not in the data it describes. After constructing the two objects, we get their serialized form by calling the '''''toXMLString()''''' method on them. The returned strings are the ones that must be passed to the remote operation.
 +
 
 +
Next, we invoke the remote operation and then we just print the returned transformation programs.
  
-- [[User:Sboutsis|Sboutsis]] 19:11, 19 March 2007 (EET)
+
-- [[User:Sboutsis|Sboutsis]] 15:05, 30 July 2007 (EEST)

Latest revision as of 18:56, 6 July 2016

Alert icon2.gif THIS SECTION OF GCUBE DOCUMENTATION IS CURRENTLY UNDER UPDATE.

Metadata Broker

Introduction

The main functionality of the Metadata Broker is to convert XML documents from some input schema and/or language to another. The inputs and outputs of the transformation process can be single records, ResultSets or entire collections. In the special case where both the inputs and the output are collections, a persistent transformation is possible, meaning that whenever there is a change in the input collection(s), the new data will be automatically transformed in order for the change to be reflected to the output collection.

Transformation Programs

Complex transformation processes are described by transformation programs, which are XML documents. Transformation programs are stored in the DIS. Each transformation program can reference other transformation programs and use them as “black-box” components in the transformation process it defines.

Each transformation program consists of:

  • One or more data input definitions. Each one defines the schema, language and type (record, ResultSet or collection) of the data that must be mapped to the particular input.
  • One or more input variables. Each one of them is placeholder for an additional string value which must be passed to the transformation program at run-time.
  • Exactly one data output definition, which contains the output data type (record, ResultSet or collection), schema and language.
  • One or more transformation rule definitions.

Note: The name of the input or output schema must be given in the format SchemaName=SchemaURI, where SchemaName is the name of the schema and SchemaURI is the URI of its definition, e.g. DC=http://dublincore.org/schemas/xmls/simpledc20021212.xsd.

Transformation Rules

Transformation rules are the building block of transformation programs. Each transformation program always contains at least one transformation rule. Transformation rules describe simple transformations and execute in the order in which they are defined inside the transformation program. Usually the output of a transformation rule is the input of the next one. So, a transformation program can be thought of as a chain of transformation rules which work together in order to perform the complex transformation defined by the whole transformation program.

Each transformation rule consists of:

  • One or more data input definitions. Each definition contains the schema, language, type (record, ResultSet, collection or input variable) and data reference of the input it describes. Each one of these elements (except for the 'type' element) can be either a literal value, or a reference to another value defined inside the transformation program (using XPath syntax).
  • Exactly one data output, which can be:
    • A definition that contains the output data type (record, ResultSet or collection), schema and language.
    • A reference to the transformation program‘s output (using XPath syntax). This is the way to express that the output of this transformation rule will also be the output of the whole transformation program, so such a reference is only valid for the transformation program‘s final rule.
  • The name of the underlying program to execute in order to do the transformation, using standard 'packageName.className' syntax.

A transformation rule can also be a reference to another transformation program. This way, whole transformation programs can be used as parts of the execution of another transformation program. The reference can me made using the unique id of the transformation program being referenced and a set of value assignments to its data inputs and variables.

Note: The name of the input or output schema must be given in the format SchemaName=SchemaURI, where SchemaName is the name of the schema and SchemaURI is the URI of its definition, e.g. DC=http://dublincore.org/schemas/xmls/simpledc20021212.xsd.

Variable fields inside data input/output definitions

Inside the definition of data inputs and outputs of transformation programs and transformation rules, any field except for 'Type' can be declared as a variable field. Just like inputs variables, variable fields get their values by run-time assignments. In order to declare an element as a variable field of its parent element, one needs to include 'isVariable=true' in the element's definition. When the caller invokes a broker operation in order to transform some metadata, he/she can provide a set of value assignments to the input variables and variable fields of the transformation program definition. But the caller has access only to the variables of the whole transformation program, not the internal transformation rules. However, transformation rules can also contain variable fields in their input/output definitions. Since the caller cannot explicitly assign values to them, such variable fields must contain an XPath expression as their value, which points to another element inside the transformation program that contains the value to be assigned. These references are resolved when each transformation rule is executed, so if, for example, a variable field of a transformation rule's input definition points to a variable field of the previous transformation rule's output definition, it is guaranteed that the referenced element's value will be there at the time of execution of the second transformation rule. It is important to note that every XPath expression should specify an absolute location inside the document, which basically means it should start with '/'.

There is a special case where the language and schema fields of a transformation program's data input definition can be automatically get values assigned to them, without requiring the caller to do so. This can happen when the type of the particular data input is set to collection. In this case, the Metadata Broker Service automatically retrieves the format of the metadata collection described by the ID that is given through the Reference field of the data input definition and assigns the actual schema descriptor and language identifier of the collection to the respective variable fields of the data input definition. If any of these fields already contain values, these values are compared with the ones retrieved from the metadata collection's profile, and if they are different the execution of the transformation program stops and an exception is thrown by the Metadata Broker service. Note that the automatic value assignment works only on data inputs of transformation programs and NOT on data inputs of individual transformation rules.

Programs

A program (not to be confused with transformation program) is the Java class which performs the actual transformation on the input data. A transformation rule is just a XML description of the interface (inputs and output) of a program. A program must implement the Program Java interface:

interface Program {
  public String getOutput();
}

getOutput() returns the output of the transformation program as a string. If the output is a record, the return value should be the transformed record. If the output is a ResultSet, the return value should be the ResultSet EPR. Finally, if the output is a collection, the return value should be the collection id.

The Program interface does not define any transformation methods. Each program can define any number of methods, but when the transformation rule which references it is executed, the metadata broker service will use reflection in order to locate the correct method to call based on the input and output types defined in the transformation rule that initiates the call to the program's transformation method. The valid data types for the parameters of each transformation method (so that the broker can locate and use them) are:

  • RecordType: A data type that holds the schema, language and payload of a full record.
  • ResultSetType: A data type that holds the schema, language and EPR of a ResultSet.
  • CollectionType: A data type that holds the schema, language and id of a collection.
  • VariableType: A data type that holds the string value of a variable defined inside a transformation program.

The definitions of these data types are contained in the metadata broker library.

When a transformation method of a program is called as the result of the execution of a transformation rule with N inputs and one output, the following convention is used:

  • The first N parameters passed to the method are objects holding information about the input data.
  • The last parameter is an object holding information about the output data.

The type of each parameter should one of the four types mention before (RecordType, ResultSetType, CollectionType, VariableType).

Implementation Overview

The metadata broker consists of two components:

  • The metadata broker service
    The metadata broker service provides the functionality of the metadata broker in the form of a stateless service. In the case of a persistent transformation, the service creates a WS-Resource holding information about this transformation and registers for notifications concerning changes in the input collection(s). The created resources are not published and remain completely invisible to the caller.

    The service exposes the following operations:
    • transform(TransformationProgramID, params) -> String
      This operation takes the DiligentID of a transformation program stored in the DIS and a set of transformation parameters. The referenced transformation program is executed using the provided parameters, which are just a set of value assignments to variables defined inside the transformation program. The metadata broker library contains a helper class for creating such a parameter set.
    • transformWithNewTP(TransformationProgram, params) -> String
      This operation offers the same functionality as the previous one. However, in this case the first parameter is the full XML definition of a transformation program in string format and not the DiligentID of a stored one.
    • findPossibleTransformationPrograms (InputDesc, OutputDesc) -> TransformationProgram[]
      This operation takes the description of some input format (type, language and schema) as well as the description of a desired output format, and returns an array of transformation programs definitions that could be used in order to perform the required conversion. These transformation programs may not exist before invoking this operation. They are produced on the fly, by combining all the existing transformation programs which are compatible with each other, trying to synthesize more complex transformation programs. Of course, if there is already an existing transformation program which is applicable for the requested type of transformation, it is included in the results. If the output format is null, then the returned array contain all transformation programs that can be applied to the specified input format, producing any possible output format.

  • The metadata broker library
    The metadata broker library contains the definitions of the RecordType, CollectionType, ResultSetType and VariableType Java classes, as well as the definition of the Program Java interface. The following programs are also included in it:
    • Generic XSLT record-to-record transformer (GXSLT_Rec2Rec): transforms a given record using a given XSLT definition. The output is the transformed record.
    • Generic XSLT ResultSet-to-ResultSet transformer (GXSLT_RS2RS): transforms a given ResultSet using a given XSLT definition, producing a new ResultSet. The output is the new ResultSet's EPR.
    • Generic XSLT Collection-to-Collection transformer (GXSLT_Col2Col): transforms a given collection using a given XSLT, producing a new colletion. The output is the new collection id.
    • Generic XSLT ResultSet-to-Collection transformer (GXSLT_RS2Col): transforms the records of a given ResultSet using a given XSLT, and adds them to a new collection with caller-defined attributes. The output is the new collection id.
    • Generic XSLT Collection-to-ResultSet transformer (GXSLT_Col2RS): transforms each record of a given collection using a given XSLT and creates a new ResultSet containing the transformed records. The output is the new ResultSet's EPR.
The transformation of metadata using any of the above programs, except for the GXSLT_Rec2Rec program, is a non-blocking operation. This means that the caller will not block until the transformation is completed, since the process of transforming a big ResultSet or collection may be quite time-consuming. For this purpose, each program prepares the output data (which is either the endpoint reference of the output ResultSet or the ID of the output collection, depending on the output data type of the transformation) which should be returned to the caller and then spawns a new thread to perform the transformation process.
Internally, some programs depend on others, meaning that they use other programs in order to avoid useless code duplication. For instance, the GXSLT_Rec2Rec program is used by every other program because the transformation of any complex type of data input (such as ResultSets or collections) finally comes down to transforming single records one-by-one. Of course the XSLTs are always compiled before performing bulk transformations, in order to make the whole process faster.
Each program is placed in a java package of its own, beginning with ‘org.diligentproject.metadatamanagement.metadatabrokerlibrary.programs’. However, this is just a convention followed for the default programs contained in the metadata broker library. There is no restriction on the package names of user-defined programs. In order for user-defined programs to be accessible by the Metadata Broker, they should be put in JAR files and copied to the ‘lib’ directory under the installation directory of ws-core (or to any directory that belongs to the CLASSPATH environment variable).

Dependencies

  • MetadataBrokerService
    • jdk 1.5
    • WS-Core
    • MetadatBrokerLibrary
    • DISHLSClient
  • MetadataBrokerLibrary
    • jdk 1.5
    • WS-Core
    • ResultSet bundle
    • DISHLSClient
    • Metadata catalog service stubs
    • Metadata catalog library

Usage Examples

The following examples show how some of the transformation programs contained in the metadata broker library can be used. For this purpose, the client-side code is shown, describing the necessary steps to invoke the operations of the metadata broker service. Furthermore, the full definition of the referenced programs and transformation programs is also given. These definitions can be used as the base for creating new programs and transformation programs by anyone who needs to do this.

Transforming a single record using a XSLT

This is the GXSLT_Rec2Rec class (included in the metadata broker library), which performs the actual conversion:

package org.diligentproject.metadatamanagement.metadatabrokerlibrary.programs.GXSLT_Rec2Rec;

import org.apache.log4j.Logger;
import org.diligentproject.metadatamanagement.metadatabrokerlibrary.programs.Program;
import org.diligentproject.metadatamanagement.metadatabrokerlibrary.programs.RecordType;
import org.diligentproject.metadatamanagement.metadatabrokerlibrary.programs.VariableType;
import org.diligentproject.metadatamanagement.metadatabrokerlibrary.util.GenericResourceRetriever;

import java.io.StringReader;
import java.io.StringWriter;
import java.rmi.RemoteException;

import javax.xml.transform.OutputKeys;
import javax.xml.transform.Templates;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

public class GXSLT_Rec2Rec implements Program {
	
	private static Logger log = Logger.getLogger(GXSLT_Rec2Rec.class);
	private StringWriter output;
	private static TransformerFactory factory = TransformerFactory.newInstance();
		
	public void transform(RecordType record, VariableType xslt, RecordType outRecord) throws RemoteException {
		try {
                    String xsltdef = GenericResourceRetriever.retrieveGenericResource(xslt.getReference());
	            Transformer t = factory.newTransformer(new StreamSource(new StringReader(xsltdef)));
        	    t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
	            output = new StringWriter();
        	    t.transform(new StreamSource(new StringReader(record.getReference())), new StreamResult(output));
	        } catch(Exception e) {
        		log.error("Failed to transform record. Throwing exception.");
        		throw new RemoteException(e.toString());
	        }
	}
	
	public void transform(String record, Templates xslt) throws RemoteException {
	       try {
	            Transformer t = xslt.newTransformer();
	            t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
	            output = new StringWriter();
	            t.transform(new StreamSource(new StringReader(record)), new StreamResult(output));
	        } catch(Exception e) {
	        	log.error("Failed to transform record. Throwing exception.");
	        	throw new RemoteException(e.toString());
	        }
	}
	
	public static Templates compileXSLT(String xslt) throws TransformerConfigurationException {
		return factory.newTemplates(new StreamSource(new StringReader(xslt)));
	}
	
	public String getOutput() {
		return output.toString();
	}	
}

The only transformation method that can be used externally (when this program is called by a transformation program) is 'public void transform(RecordType record, VariableType xslt, RecordType outRecord)'. The other 'transform' method as well as the 'compileXSLT' method are intended to be used internally by other programs which call GXSLT_Rec2Rec during their execution.

This is the XML definition of the transformation program:

	 
<?xml version="1.0" encoding="UTF-8"?>	 
<TransformationProgram>
	<Input name="TPInput">
		<Schema isVariable="true" />
		<Language isVariable="true" />
		<Type>record</Type>
		<Reference isVariable="true" />
	</Input>
	<Variable name="XSLT" />
	<Output name="TPOutput">
		<Schema isVariable="true" />
		<Language isVariable="true" />
		<Type>record</Type>
	</Output>
	<TransformationRule>
		<Definition>
		<Transformer>org.diligentproject.metadatamanagement.metadatabrokerlibrary.programs.GXSLT_Rec2Rec.GXSLT_Rec2Rec</Transformer>
		<Input name="Rule1Input1">
			<Schema isVariable="true"> //Input[@name='TPInput']/Schema </Schema>
			<Language isVariable="true"> //Input[@name='TPInput']/Language </Language>
			<Type>record</Type>
			<Reference isVariable="true"> //Input[@name='TPInput']/Reference </Reference>
		</Input>
		<Input name="Rule1Input2">
			<Schema />
			<Language />
			<Type>variable</Type>
			<Reference isVariable="true"> //Variable[@name='XSLT'] </Reference>
		</Input>
		<Output name="TPRule1Output">
			<Reference>//Output[@name='TPOutput']</Reference>
		</Output>
		</Definition>
	</TransformationRule>
</TransformationProgram>

In this example, the transformation program defined above is stored in the DIS as a profile with UniqueID=ce6b9860-ebfe-11db-8b69-dd428ed9686d. The input record that is going to be transformed is stored in a local file named input.xml, and the XSLT that will be used is stored as a generic resource with UniqueID=ed358e00-23f2-11dc-a35f-9c01d805f283 in the DIS. The following code fragment reads the input record from the file, creates a set of parameters which are used in order to assign the input data and the XSLT ID to the respective transformation program variable inputs, and then invokes the transform operation of the metadata broker service. The result is written to the console. The URI of the remote service is given as a command-line argument.

public class Client {
	public static void main(String[] args) {
		try {
			// Get the broker service porttype
			EndpointReferenceType endpoint = new EndpointReferenceType();
			endpoint.setAddress(new Address(args[0]));
			MetadataBrokerPortType broker = new MetadataBrokerServiceAddressingLocator().getMetadataBrokerPortTypePort(endpoint);

			// Read the input data file into a string
			String inputData = readTextFile("input.xml");
				
			// Create a set of transformation parameters, assigning values to variables
			// defined in the transformation program
			TransformationParameters tparams = TransformationParameters.newInstance();
			tparams.addParameter("//Input[@name='TPInput']/Schema", "Schema1=URI1");
			tparams.addParameter("//Input[@name='TPInput']/Language", "en");
			tparams.addParameter("//Input[@name='TPInput']/Reference", inputData);
			tparams.addParameter("//Output[@name='TPOutput']/Schema", "Schema2=URI2");
			tparams.addParameter("//Output[@name='TPOutput']/Language", "en");
			tparams.addParameter("//Variable[@name='XSLT']", "ed358e00-23f2-11dc-a35f-9c01d805f283");

			// Prepare the invocation parameters
			TransformWithNewTP params = new TransformWithNewTP();
			params.setTransformationProgramID("ce6b9860-ebfe-11db-8b69-dd428ed9686d");
			params.setParameters(tparams.getAsString());

			// Invoke the remote operation and write the result to the console
			System.out.println(broker.transform(params));
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
		 
	private static String readTextFile(String filename) throws IOException {
		BufferedReader br = new BufferedReader(new FileReader(filename));
		StringBuffer buf = new StringBuffer();
		String tmp;
		while ((tmp = br.readLine()) != null) {
			buf.append(tmp + "\n");
		} 
		br.close();
		return buf.toString();
	}
}

Transforming an entire ResultSet using a XSLT

This is the definition of the GXSLT_RS2RS class (included in the metadata broker library), which performs the actual conversion:

package org.diligentproject.metadatamanagement.metadatabrokerlibrary.programs.GXSLT_RS2RS;

import java.rmi.RemoteException;

import org.apache.log4j.Logger;
import org.diligentproject.metadatamanagement.metadatabrokerlibrary.programs.Program;
import org.diligentproject.metadatamanagement.metadatabrokerlibrary.programs.ResultSetType;
import org.diligentproject.metadatamanagement.metadatabrokerlibrary.programs.VariableType;
import org.diligentproject.metadatamanagement.metadatabrokerlibrary.programs.GXSLT_RS2RS.GXSLT_RS2RS_Worker;
import org.diligentproject.searchservice.searchlibrary.rsclient.elements.RSResourceWSRFType;
import org.diligentproject.searchservice.searchlibrary.rswriter.RSXMLWriter;

public class GXSLT_RS2RS implements Program {
	private static Logger log = Logger.getLogger(GXSLT_RS2RS.class);
	private String output = null; 

	public void transform(ResultSetType RS, VariableType xslt, ResultSetType outRS) throws RemoteException {
		try {
			RSXMLWriter writer = RSXMLWriter.getRSXMLWriter();
			new GXSLT_RS2RS_Worker(RS, writer, xslt).start();
			output = writer.getRSLocator(new RSResourceWSRFType()).getLocator();
		} catch (Exception e) {
			log.error("GXSLT_RS2RS: Failed to create writer for output resultset.", e);
			throw new RemoteException("GXSLT_RS2RS: Failed to create writer for output resultset.", e);
		}
	}

	public String getOutput() {
		return this.output;
	}

}

As stated before, bulk transformations are non-blocking. For this reason, the above code spawns a new thread to handle the transformation process. The definition of the GXSLT_RS2RS_Worker class (which extends the Thread class) follows.

package org.diligentproject.metadatamanagement.metadatabrokerlibrary.programs.GXSLT_RS2RS;

import javax.xml.transform.Templates;
import org.apache.log4j.Logger;
import org.diligentproject.metadatamanagement.metadatabrokerlibrary.programs.ResultSetType;
import org.diligentproject.metadatamanagement.metadatabrokerlibrary.programs.VariableType;
import org.diligentproject.metadatamanagement.metadatabrokerlibrary.programs.GXSLT_Rec2Rec.GXSLT_Rec2Rec;
import org.diligentproject.searchservice.searchlibrary.resultset.elements.ResultElementGeneric;
import org.diligentproject.searchservice.searchlibrary.rsclient.elements.RSLocator;
import org.diligentproject.searchservice.searchlibrary.rsclient.elements.RSResourceLocalType;
import org.diligentproject.searchservice.searchlibrary.rsreader.RSXMLIterator;
import org.diligentproject.searchservice.searchlibrary.rsreader.RSXMLReader;
import org.diligentproject.searchservice.searchlibrary.rswriter.RSXMLWriter;

class GXSLT_RS2RS_Worker extends Thread {
	private static Logger log = Logger.getLogger(GXSLT_RS2RS_Worker.class);
	private ResultSetType RS;
	private RSXMLWriter writer;
	private VariableType xslt;

	public GXSLT_RS2RS_Worker(ResultSetType resultSet, RSXMLWriter RSWriter, VariableType xsltToUse) {
		RS = resultSet;
		writer = RSWriter;
		xslt = xsltToUse;
	}

	public void run() {
		String element = null;
		int i = 0;
	
		try {
			/* Compile the XSLT so that the records in the resultset will be transformed faster */
                        String xsltdef = GenericResourceRetriever.retrieveGenericResource(xslt.getReference());
			Templates compiledXSLT = GXSLT_Rec2Rec.compileXSLT(xsltdef);

			/* Read each record of the input ResultSet, use the GXSLT_Rec2Rec program in order to transform 
			 * it and add it to the output ResultSet. */
			GXSLT_Rec2Rec GXSLTRecProgram = new GXSLT_Rec2Rec();
			RSXMLReader reader = RSXMLReader.getRSXMLReader(new RSLocator(RS.toString()));
			RSXMLIterator iter = reader.makeLocalPatiently(new RSResourceLocalType(), 1200000).getRSIterator(1200000);
			while(iter.hasNext()) {
				if (writer.isTimerAlive())
					writer.resetTimer();
				ResultElementGeneric elem = (ResultElementGeneric)iter.next(ResultElementGeneric.class);
				if (elem == null)
					continue;
				element = elem.getPayload();
				GXSLTRecProgram.transform(element, compiledXSLT);
				writer.addResults(new ResultElementGeneric(elem.getRecordAttributes(ResultElementGeneric.RECORD_ID_NAME)[0].getAttrValue(),
					elem.getRecordAttributes(ResultElementGeneric.RECORD_COLLECTION_NAME)[0].getAttrValue(),
					GXSLTRecProgram.getOutput()));
				i++;
				element = null;
			}
			writer.close();
		} catch (Exception e) {
			if (element != null)
				i++;
			log.error("GXSLT_RS2RS: Failed to transform the given resultset. Stopped at element " + String.valueOf(i) + ":\n" + element, e);
			e.printStackTrace();
			try {
				writer.close();
			} catch (Exception e1) {
				log.error("GXSLT_RS2RS: Failed to close resultset.");
			}
		}
	}
}

The above code uses the GXSLT_Rec2Rec program to compile the XSLT so that the transformation executes as fast as possible. Then it iterates over the whole set of elements contained in the ResultSet, transforming each one using the compiled XSLT. Each transformed element is then added to the output ResultSet.

The following is the XML definition of the transformation program used for this type of transformation.

<TransformationProgram>
	<Input name="TPInput">
		<Schema isVariable="true" />
		<Language isVariable="true" />
		<Type>resultset</Type>
		<Reference isVariable="true" />
	</Input>
	<Variable name="XSLT" />
	<Output name="TPOutput">
		<Schema isVariable="true" />
		<Language isVariable="true" />
		<Type>resultset</Type>
	</Output>
	<TransformationRule>
		<Definition>
			<Transformer>org.diligentproject.metadatamanagement.metadatabrokerlibrary.programs.GXSLT_RS2RS.GXSLT_RS2RS</Transformer>
			<Input name="Rule1Input1">
				<Schema isVariable="true"> //Input[@name='TPInput']/Schema </Schema>
				<Language isVariable="true"> //Input[@name='TPInput']/Language </Language>
				<Type>resultset</Type>
				<Reference isVariable="true"> //Input[@name='TPInput']/Reference </Reference>
			</Input>
			<Input name="Rule1Input2">
				<Schema />
				<Language />
				<Type>variable</Type>
				<Reference isVariable="true"> //Variable[@name='XSLT'] </Reference>
			</Input>
			<Output name="TPRule1Output">
				<Reference>//Output[@name='TPOutput']</Reference>
			</Output>
		</Definition>
	</TransformationRule>
</TransformationProgram>

In this example, the transformation program defined above is stored in the DIS as a profile with UniqueID=eb46fc40-ebfe-11db-8b6b-dd428ed9686d. The EPR of the input ResultSet that is going to be transformed is stored in a local file named input.xml, and the XSLT is the same used in the previous example. The URI of the remote service is given as a command-line argument. The client code that invokes the broker service and performs the transformation is the same as in the previous example. The only thing that changes is the ID of the transformation program that is called, which should be set to eb46fc40-ebfe-11db-8b6b-dd428ed9686d.

Using a transformation program within another transformation program

As stated before, whole transformation programs can be used as 'black-box' components inside another transformation program. This can be done by defining a transformation rule which describes the call to the second transformation program.

The transformation program that will be called from another transformation program in this example is defined below.

<TransformationProgram>
    <Input name="TPInput">
        <Schema>SCH1=http://schema1.xsd</Schema>
        <Language>en</Language>
        <Type>resultset</Type>
        <Reference isVariable="true" />
    </Input>
    <Output name="TPOutput">
        <Schema>SCH3=http://schema3.xsd</Schema>
        <Language>en</Language>
        <Type>resultset</Type>
    </Output>
    <TransformationRule>
        <Definition>
            <Transformer>org.diligentproject.program2</Transformer>
            <Input name="Rule2Input">
                <Schema isVariable="true">//Output[@name='TPRule1Output']/Definition/Schema</Schema>
                <Language isVariable="true">//Output[@name='TPRule1Output']/Definition/Language</Language>
                <Type>resultset</Type>
                <Reference isVariable="true">//Output[@name='TPRule1Output']/Definition/Reference</Reference>
            </Input>
            <Output name="Rule2Output">
                <Reference>//Output[@name='TPOutput']</Reference>
            </Output>
        </Definition>
    </TransformationRule>
</TransformationProgram>

The input and output schemas and languages are predefined inside this transformation program, so the only thing that should be specified at run-time is the actual input data reference. Let's say that this transformation program is stored in the DIS and its UniqueID is 910e0710-f251-11db-88f9-f971eaf0d653.

The transformation program that uses the above transformation program is defined below.

<TransformationProgram>
    <Input name="TPInput">
        <Schema>SCH1=http://schema1.xsd</Schema>
        <Language>en</Language>
        <Type>resultset</Type>
        <Reference isVariable="true" />
    </Input>
    <Variable name="var1"/>
    <Output name="TPOutput">
        <Schema>SCH2=http://schema2.xsd</Schema>
        <Language>en</Language>
        <Type>resultset</Type>
    </Output>
    <TransformationRule>
        <Reference>
            <Program>910e0710-f251-11db-88f9-f971eaf0d653</Program>
            <Value isVariable="true" target="//Input[@name='TPInput']/Reference">//Input[@name='TPInput']/Reference</Value>
            <Output name="Rule1Output" />
        </Reference>
    </TransformationRule>
    <TransformationRule>
        <Definition>
            <Transformer>org.diligentproject.program1</Transformer>
            <Input name="Rule2Input1">
                <Schema isVariable="true">//Output[@name='TPRule1Output']/Definition/Schema</Schema>
                <Language isVariable="true">//Output[@name='TPRule1Output']/Definition/Language</Language>
                <Type>resultset</Type>
                <Reference isVariable="true">//Output[@name='TPRule1Output']/Definition/Reference</Reference>
            </Input>
            <Input name="Rule2Input2">
                <Schema />
                <Language />
                <Type>variable</Type>
                <Reference isVariable="true"> //Variable[@name='var1'] </Reference>
            </Input>
            <Output name="Rule2Output">
                <Reference>//Output[@name='TPOutput']</Reference>
            </Output>
        </Definition>
    </TransformationRule>
</TransformationProgram>

The element that describes the call to the first transformation program is the first TransformationRule element. This element specifies the UniqueID of the transformation program to be called, as well as a mapping of values to the variable inputs of that transformation program. Since the first transformation program contains only one variable input (the input data reference), there is only one mapping in this example, described by a Value element. The target attribute of this element specifies the target element of the other transformation program whose value is to be set, and the element's content specifies the actual value to set. In this example, this is not a literal value but a reference to another element of the transformation program, where the value should be taken from. Specifically, we have specified that the first transformation program's input should be the input of the second transformation program. Since the value of the Value element is a XPath expression, the isVariable attribute is also set to true, meaning that the content should be interpreted as a reference to another element and not as a literal value. The output of the first transformation program becomes the output of the transformation rule that called it, and is named Rule1Output. This output is then used as the input of the next transformation rule.

Finding a set of transformation programs given a source and target metadata formats

This example demonstrates how one can get an array of transformation programs that could be used in order to transform metadata from a given source format to a given target format. The operation that can be used in order to accomplish this is 'findPossibleTransformationPrograms'. The caller must specify a source and target metadata format and the service searches for possible "chains" of existing transformation programs that could be used in order to carry out the transformation. There are three rules imposed by the Metadata Broker service:

  • Only transformation programs with one data input are considered during the search
  • Each transformation program can be used at most one time inside each chain of transformation programs (this is needed in order to avoid infinite loops)
  • A transformation program that produces a collection as its output can only be the last one inside a chain of transformation programs

Each chain composed by the Metadata Broker service is converted to a transformation program, which "links" the individual transformation programs forming the chain. This transformation program contains a transformation rule for each transformation program in the chain. Each transformation rule describes a call to the corresponding transformation program. The result of the operation is an array of strings, where each string corresponds to a synthesized transformation program.

It is possible that some of the transformation programs included in a chain contain some input variables. For each found variable, the Metadata Broker service places a variable to the synthesized transformation program, and this variable is mapped to the original one. This way one can specify the values of the variables contained in every transformation program involved in the chain, by specifying the values of the corresponding variables of the synthesized transformation program. This mechanism is necessary because the individual transformation programs contained in the chain are not visible to the caller. The only entity that the caller sees is the synthesized transformation program that is responsible for calling the ones it is built from.

Consider the case where a transformation program whose output language is a variable is added to a chain. When the service searches for another transformation program to append to the chain after that one, it may find a transformation program whose input language is 'en' (English). Then, the value 'en' will be assigned to the variable field describing the previous transformation program's output language. The same happens if an output field (schema or language) of a transformation program contains a specific value and the corresponding input field of the next transformation program is a variable. But what happens if the two fields are both variables? In this case, an input variable is added to the synthesized transformation program. When the caller uses this transformation program, he/she will need to specify a value for this variable. That value will then be assigned automatically both to the output field of the first transformation program and to the input field of the second transformation program.

Now let's see how one can call the 'findPossibleTransformationPrograms' operation:

import org.apache.axis.message.addressing.Address;
import org.apache.axis.message.addressing.EndpointReferenceType;
import org.diligentproject.metadatamanagement.metadatabrokerlibrary.programs.TPIOType;
import org.diligentproject.metadatamanagement.metadatabrokerservice.stubs.FindPossibleTransformationProgramsResponse;
import org.diligentproject.metadatamanagement.metadatabrokerservice.stubs.MetadataBrokerPortType;
import org.diligentproject.metadatamanagement.metadatabrokerservice.stubs.FindPossibleTransformationPrograms;
import org.diligentproject.metadatamanagement.metadatabrokerservice.stubs.service.MetadataBrokerServiceAddressingLocator;

public class TestFindPossibleTPs {
	
	public static void main(String[] args) {
		try {
			// Create endpoint reference to the service
			EndpointReferenceType endpoint = new EndpointReferenceType();
			endpoint.setAddress(new Address(args[0]));
			MetadataBrokerPortType broker = new MetadataBrokerServiceAddressingLocator().getMetadataBrokerPortTypePort(endpoint);
			
			// Create the IO format descriptors
			TPIOType inFormat = TPIOType.fromParams(args[1], args[2], args[3], "");
			TPIOType outFormat = TPIOType.fromParams(args[4], args[5], args[6], "");
						
			// Prepare the invocation parameters
			FindPossibleTransformationPrograms params = new FindPossibleTransformationPrograms();
			params.setInputFormat(inFormat.toXMLString());
			params.setOutputFormat(outFormat.toXMLString());
			
			// Invoke the remote operation
			FindPossibleTransformationProgramsResponse resp = broker.findPossibleTransformationPrograms(params);
			String[] TPs = resp.getTransformationProgram();
			for (String TP : TPs) {
				System.out.println(TP);
				System.out.println();
			}
			
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}

This code fragment assumes the following:

  • args[0] = the Metadata Broker service URI
  • args[1] = the source format type ('resultset', 'collection' or 'record')
  • args[2] = the source format language
  • args[3] = the source format schema (in 'schemaName=schemaURI' format)
  • args[4] = the target format type ('resultset', 'collection' or 'record')
  • args[5] = the target format language
  • args[6] = the target format schema (in 'schemaName=schemaURI' format)


First, an endpoint reference to the metadata broker service is created. Then, we have to create the source and target format descriptors. The remote operation accepts two strings describing the two metadata formats. These strings are nothing more that the serialized form of two TPIOType objects. The TPIOType class is the base class of the CollectionType, ResultSetType and RecordType classes. This class defines the static method fromParams which creates and returns an object describing a metadata format based on given values for the format's schema, language, type and data reference. The returned object will be an instance of the correct class (derived from TPIOType), based on the given value for the 'type' attribute. Here, the 'reference' attribute is not used because we are interested in the metadata format itself and not in the data it describes. After constructing the two objects, we get their serialized form by calling the toXMLString() method on them. The returned strings are the ones that must be passed to the remote operation.

Next, we invoke the remote operation and then we just print the returned transformation programs.

-- Sboutsis 15:05, 30 July 2007 (EEST)