Spatial Data Processing

From Gcube Wiki
Revision as of 15:10, 1 August 2012 by Francesco.barchetta (Talk | contribs) (More informations)

Jump to: navigation, search

Overview

Geospatial Data Processing takes advantage of the OGC Web Processing Service (WPS) as web interface to allow for the dynamic deployment of user processes. In this case the WPS chosen is the 52° North WPS, allowing the development and deployment of user “algorithms”. Is dimostrated that such “algorithms” can be developed to be processed exploiting the powerful and distributed framework offered by Apache™ Hadoop™ MapReduce

Thus was born WPS-hadoop.

Key Features

WPS-hadoop offers a web interface to access the algorithms from external HTTP clients through three different kind of requests, made available to 52 North WPS:

- The GetCapabilities operation provides access to general information about a live WPS implementation, and lists the operations and access methods supported by that implementation. 52N WPS supports the GetCapabilities operation via HTTP GET and POST.

- The DescribeProcess operation allows WPS clients to request a full description of one or more processes that can be executed by the service. This description includes the input and output parameters and formats and can be used to automatically build a user interface to capture the parameter values to be used to execute a process.

- The Execute operation allows WPS clients to run a specified process implemented by the server, using the input parameter values provided and returning the output values produced. Inputs can be included directly in the Execute request, or reference web accessible resources.


Design

Extending the AbstractAlgorithm class (by 52N) we have created a new abstract class called HadoopAbstractAlgorithm where the Business Logic, hidden to the developer, is used to execute the process creating a Job for the hadoop framework.


Blocks.png


Develop a custom process

The custom process class has to extend HadoopAbstractAlgorithm which allows you to specify the Hadoop Configuration parameters (e.g. from XML files), the Mapper and Reducer classes, Input Paths, Output Path, all the operations needed before to run the process and the way to retrieve the results. By using HadoopAbstractAlgorithm, you need to fill these simple methods:

  • protected Class<? extends Mapper<?, ?, LongWritable, Text>> getMapper()

This method returns the class to be used as Mapper;

  • protected Class<? extends Reducer<LongWritable, Text, ?, ?>> getReducer()

This method returns the class to be used as Reducer (if exists);

  • protected Path[] getInputPaths(Map<String, List<IData>> inputData)

This method allows to the business logic to know the exact input path(s) to pass to the Hadoop framework;

  • protected String getOutputPath()

This method allows to the business logic to know the exact output path to pass to the Hadoop framework;

  • protected Map buildResults()

This method is called by the business logic method to pass build output that the WPS does expect;

  • public void prepareToRun(Map<String, List<IData>> inputData)

This method has to be filled by all the operations to do before to run the Hadoop Job (e.g. WPS input validation);

  • protected JobConf getJobConf()

This method allows the user to specify all the configuration resources for (from) Hadoop framework (e.g. XML conf files).

HadoopAbstractAlgorithm.png

Deploy custom process

WPS-hadoop is deployed over Tomcat container.

In order to deploy the recently developed process, you need to:

  1. Export in a jar file the process.
  2. Copy the exported lib into the WEB-INF/lib directory.
  3. Restart tomcat.
    Next, we need to register the newly created algorithm:
  4. Go to http://localhost:yourport/wps/ , e.g.http://localhost:8080/wps/.
  5. Click on 52n WPSAdmin console.
  6. Login with:
    • Username: wps
    • Password: wps
    The Web Admin Console lets you change the basic configuration of the WPS and upload processes.
  7. Click on Algorithm Repository --> Properties (the '+' sign).
  8. Click on the Green '+' to register your process: Type in the left field Algorithm and in the right field the fully qualified class name of your created class (i.e. package + class name, e.g. org.n52.wps.demo.ConvexHullDemo).
  9. Click on the save icon (the 'disk').
  10. Next, Click on the top left on 'Save and Activate configuration'.
  11. Your new Process is now available, test it under: http://localhost:yourport/wps/WebProcessingService?Request=GetCapabilities&Service=WPS or directly http://localhost:yourport/wps/test.hmtl.


N.B. Alternatively to the step 1. you can follow steps 4. , 5. and 6. first, then click on Upload Process and pick the .java file of your just developed process. Then follow from step 2 onwards.

Some complete examples

The Bathymetry Algorithm

Here is the complete description of the use of WPS-hadoop library, through the example of Bathymetry retrieving from a netCDF file.

Class Diagram

WPSClassDiagram.png

BathymetryAlgorithm.xml

This file must be named exactly like the .java one.

<?xml version="1.0" encoding="UTF-8"?>
<wps:ProcessDescriptions xmlns:wps="http://www.opengis.net/wps/1.0.0" xmlns:ows="http://www.opengis.net/ows/1.1" xmlns:xlink="http://www.w3.org/1999/xlink" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/wps/1.0.0
http://geoserver.itc.nl:8080/wps/schemas/wps/1.0.0/wpsDescribeProcess_response.xsd" xml:lang="en-US" service="WPS" version="1.0.0">
        <ProcessDescription wps:processVersion="1.0.0" storeSupported="true" statusSupported="false">
                <ows:Identifier>com.terradue.wps.BathymetryAlgorithm</ows:Identifier>
                <ows:Title>Bathymetry Algorithm</ows:Title>
                <ows:Abstract>by Hadoop</ows:Abstract>
                <ows:Metadata xlink:title="Bathymetry" />
                <DataInputs>
                        <Input minOccurs="1" maxOccurs="1">
                                <ows:Identifier>InputFile</ows:Identifier>
                                <ows:Title>InputFile</ows:Title>
                                <ows:Abstract>URL to a file containing x,y parameters</ows:Abstract>
                                <LiteralData>
                                        <ows:DataType ows:reference="xs:string"></ows:DataType>
                                        <ows:AnyValue/>
                                </LiteralData>
                        </Input>
                </DataInputs>
                <ProcessOutputs>
                        <Output>
                                <ows:Identifier>result</ows:Identifier>
                                <ows:Title>result</ows:Title>
                                <ows:Abstract>result</ows:Abstract>
                                <LiteralOutput>
                                        <ows:DataType ows:reference="xs:string"/>
                                </LiteralOutput>
                        </Output>
                </ProcessOutputs>
        </ProcessDescription>
</wps:ProcessDescriptions>

Requests examples

This is an example of how to request the execution of the BathymetryAlgorithm.

XML request example

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<wps:Execute service="WPS" version="1.0.0" xmlns:wps="http://www.opengis.net/wps/1.0.0" xmlns:ows="http://www.opengis.net/ows/1.1" 
xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/wps/1.0.0
    http://schemas.opengis.net/wps/1.0.0/wpsExecute_request.xsd">
    <ows:Identifier>com.terradue.wps_hadoop.examples.bathymetry.BathymetryAlgorithm</ows:Identifier>
    <wps:DataInputs>
        <wps:Input>
            <ows:Identifier>InputFile</ows:Identifier>
            <ows:Title>Input file for Bathymetry</ows:Title>
            <wps:Data>
                <wps:LiteralData>http://t2-10-11-30-97.play.terradue.int:8888/wps/maps/coordinates</wps:LiteralData>
            </wps:Data>
        </wps:Input>
    </wps:DataInputs>
    <wps:ResponseForm>
    <wps:ResponseDocument storeExecuteResponse="false">
        <wps:Output asReference="false">
            <ows:Identifier>result</ows:Identifier>
        </wps:Output>
    </wps:ResponseDocument>
    </wps:ResponseForm>
</wps:Execute>

KVP (Key Value Pairs) request example

http://t2-10-11-30-97.play.terradue.int:8888/wps/WebProcessingService?Request=Execute&service=WPS&version=1.0.0&language=en-CA&Identifier=com.terradue.wps_hadoop.examples.bathymetry.BathymetryAlgorithm&DataInputs=InputFile=http://t2-10-11-30-97.play.terradue.int:8888/wps/maps/coordinates


Resampler Algorithm

Class Diagram

ResamplerAlgorithm.png

ResamplerAlgorithm.xml

<?xml version="1.0" encoding="UTF-8"?>
<wps:ProcessDescriptions xmlns:wps="http://www.opengis.net/wps/1.0.0" xmlns:ows="http://www.opengis.net/ows/1.1" xmlns:xlink="http://www.w3.org/1999/xlink" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/wps/1.0.0
http://geoserver.itc.nl:8080/wps/schemas/wps/1.0.0/wpsDescribeProcess_response.xsd" xml:lang="en-US" service="WPS" version="1.0.0">
        <ProcessDescription wps:processVersion="1.0.0" storeSupported="true" statusSupported="false">
                <ows:Identifier>com.terradue.wps_hadoop.examples.resampler.ResamplerAlgorithm</ows:Identifier>
                <ows:Title>Resampler Algorithm</ows:Title>
                <ows:Abstract>by Hadoop</ows:Abstract>
                <ows:Metadata xlink:title="resampler" />
                <DataInputs>
                        <Input minOccurs="1" maxOccurs="1">
                                <ows:Identifier>wcs_url</ows:Identifier>
                                <ows:Title>wcs_url</ows:Title>
                                <ows:Abstract>wcs_url</ows:Abstract>
                                <LiteralData>
                                        <ows:DataType ows:reference="xs:string"></ows:DataType>
                                        <ows:AnyValue/>
                                </LiteralData>
                        </Input>
                        <Input minOccurs="1" maxOccurs="1">
                                <ows:Identifier>resolution</ows:Identifier>
                                <ows:Title>resolution</ows:Title>
                                <ows:Abstract>resolution</ows:Abstract>
                                <LiteralData>
                                        <ows:DataType ows:reference="xs:string"></ows:DataType>
                                        <ows:AnyValue/>
                                </LiteralData>
                        </Input>
                </DataInputs>
                <ProcessOutputs>
                        <Output>
                                <ows:Identifier>result</ows:Identifier>
                                <ows:Title>result</ows:Title>
                                <ows:Abstract>result</ows:Abstract>
                                <LiteralOutput>
                                        <ows:DataType ows:reference="xs:string"/>
                                </LiteralOutput>
                        </Output>
                </ProcessOutputs>
        </ProcessDescription>
</wps:ProcessDescriptions>

Requests examples

This is an example of how to request the execution of the ResamplerAlgorithm.

XML request example

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<wps:Execute service="WPS" version="1.0.0" xmlns:wps="http://www.opengis.net/wps/1.0.0" xmlns:ows="http://www.opengis.net/ows/1.1" 
xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/wps/1.0.0
    http://schemas.opengis.net/wps/1.0.0/wpsExecute_request.xsd">
    <ows:Identifier>com.terradue.wps_hadoop.examples.resampler.ResamplerAlgorithm</ows:Identifier>
    <wps:DataInputs>
        <wps:Input>
            <ows:Identifier>wcs_url</ows:Identifier>
            <ows:Title>WCS product's URL to be resampled</ows:Title>
            <wps:Data>
                <wps:LiteralData>http://t2-10-11-30-98.play.terradue.int:8080/thredds/wcs/maps/SST_MED_SST_L4_NRT_OBSERVATIONS_010_004_c_2011-11-03_2011-11-04.nc
                ?service=WCS&version=1.0.0&request=GetCoverage&COVERAGE=analysed_sst&bbox=-18,20,36,45&width=100&height=100&format=geotiff</wps:LiteralData>
            </wps:Data>
        </wps:Input>
        <wps:Input>
            <ows:Identifier>wcs_url</ows:Identifier>
            <ows:Title>WCS product's URL to be resampled (in degrees)</ows:Title>
            <wps:Data>
                <wps:LiteralData>0.01666923868312760</wps:LiteralData>
            </wps:Data>
        </wps:Input>
    </wps:DataInputs>
    <wps:ResponseForm>
    <wps:ResponseDocument storeExecuteResponse="false">
        <wps:Output asReference="false">
            <ows:Identifier>result</ows:Identifier>
        </wps:Output>
    </wps:ResponseDocument>
    </wps:ResponseForm>
</wps:Execute>

Input parameter description

wcs_url: This should be an URL to a WCS to be queried to get a Coverage;

resolution: This is the desired resolution in degrees for the downloaded Coverage.

KVP (Key Value Pairs) request example

http://t2-10-11-30-97.play.terradue.int:8888/wps/WebProcessingService?Request=Execute&service=WPS&version=1.0.0&language=en-CA&Identifier=com.terradue.wps_hadoop.examples.resampler.ResamplerAlgorithm&DataInputs=wcs_url=http://t2-10-11-30-98.play.terradue.int:8080/thredds/wcs/maps/SST_MED_SST_L4_NRT_OBSERVATIONS_010_004_c_2011-11-03_2011-11-04.nc?service=WCS&version=1.0.0&request=GetCoverage&COVERAGE=analysed_sst&bbox=-18,20,36,45&width=100&height=100&format=geotiff;resolution=0.01666923868312760

Before to submit this kind of request you need to encode it in this way:

http://t2-10-11-30-97.play.terradue.int:8888/wps/WebProcessingService?Request=Execute&service=WPS&version=1.0.0&language=en-CA&Identifier=com.terradue.wps_hadoop.examples.resampler.ResamplerAlgorithm&DataInputs=wcs_url=http://t2-10-11-30-98.play.terradue.int:8080/thredds/wcs/maps/SST_MED_SST_L4_NRT_OBSERVATIONS_010_004_c_2011-11-03_2011-11-04.nc?service=WCS%25version=1.0.0%25request=GetCoverage%25COVERAGE=analysed_sst%25bbox=-18,20,36,45%25width=100%25height= 100%25format=geotiff;resolution=0.01666923868312760


More information

In order to exploit the real potential of Hadoop, more than a request should be sent to the WPS concurrently.

Intersection Algorithm

Class Diagram

IntersectionAlgorithm.png

IntersectionHadoopAlgorithm.xml

<?xml version="1.0" encoding="UTF-8"?>
<wps:ProcessDescriptions xmlns:wps="http://www.opengis.net/wps/1.0.0" xmlns:ows="http://www.opengis.net/ows/1.1" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/wps/1.0.0
http://geoserver.itc.nl:8080/wps/schemas/wps/1.0.0/wpsDescribeProcess_response.xsd" xml:lang="en-US" service="WPS" version="1.0.0">
        <ProcessDescription wps:processVersion="1.0.0" storeSupported="true" statusSupported="false">
                <ows:Identifier>com.terradue.wps_hadoop.examples.IntersectionAlgorithm</ows:Identifier>
                <ows:Title>Intersection Algorithm</ows:Title>
                <ows:Abstract>Calculate Intersection Feature</ows:Abstract>
                <ows:Metadata xlink:title="Intersection" />
                <DataInputs>
                        <Input minOccurs="1" maxOccurs="1">
                                <ows:Identifier>Polygon1</ows:Identifier>
                                <ows:Title>Polygon1</ows:Title>
                                <ows:Abstract>Polygon 1 URL</ows:Abstract>
                                <LiteralData>
                                        <ows:DataType ows:reference="xs:string"></ows:DataType>
                                        <ows:AnyValue/>
                                </LiteralData>
                        </Input>
                        <Input minOccurs="1" maxOccurs="1">
                                <ows:Identifier>Polygon2</ows:Identifier>
                                <ows:Title>Polygon2</ows:Title>
                                <ows:Abstract>Polygon 2 URL</ows:Abstract>
                                <LiteralData>
                                        <ows:DataType ows:reference="xs:string"></ows:DataType>
                                        <ows:AnyValue/>
                                </LiteralData>
                        </Input>
                </DataInputs>
                <ProcessOutputs>
                        <Output>
                                <ows:Identifier>result</ows:Identifier>
                                <ows:Title>result</ows:Title>
                                <ows:Abstract>result</ows:Abstract>
                                <LiteralOutput>
                                        <ows:DataType ows:reference="xs:string"/>
                                </LiteralOutput>
                        </Output>
                </ProcessOutputs>
        </ProcessDescription>
</wps:ProcessDescriptions>

Requests examples

This is an example of how to request the execution of the ResamplerAlgorithm.

XML request example

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<wps:Execute service="WPS" version="1.0.0" xmlns:wps="http://www.opengis.net/wps/1.0.0" xmlns:ows="http://www.opengis.net/ows/1.1" 
xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/wps/1.0.0
    http://schemas.opengis.net/wps/1.0.0/wpsExecute_request.xsd">
    <ows:Identifier>com.terradue.wps_hadoop.examples.intersection.IntersectionHadoopAlgorithm</ows:Identifier>
    <wps:DataInputs>
        <wps:Input>
            <ows:Identifier>Polygon1</ows:Identifier>
            <ows:Title>First polygon to be intersected</ows:Title>
            <wps:Data>
                <wps:LiteralData>http://www.fao.org/figis/geoserver/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=fifao:FAO_MAJOR&CQL_FILTER=fifao:F_AREA=21</wps:LiteralData>
            </wps:Data>
        </wps:Input>
        <wps:Input>
            <ows:Identifier>Polygon2</ows:Identifier>
            <ows:Title>Second Polygon to be intersected</ows:Title>
            <wps:Data>
                <wps:LiteralData>http://geo.vliz.be/geoserver/Marbound/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=Marbound:eez&CQL_FILTER%3DINTERSECTS%28Marbound%3Athe_geom%2CPOLYGON%28%28-82.410003662+34.830001831%2C-42+34.830001831%2C-42+78.166666031%2C-82.410003662+78.166666031%2C-82.410003662+34.830001831%29%29%29</wps:LiteralData>
            </wps:Data>
        </wps:Input>
    </wps:DataInputs>
    <wps:ResponseForm>
    <wps:ResponseDocument storeExecuteResponse="false">
        <wps:Output asReference="false">
            <ows:Identifier>result</ows:Identifier>
        </wps:Output>
    </wps:ResponseDocument>
    </wps:ResponseForm>
</wps:Execute>


Input parameter description

Polygon1 and Polygon2: URLs to some WFS where to get two features containing valid geometries.

KVP (Key Value Pairs) request example

http://t2-10-11-30-97.play.terradue.int:8888/wps/WebProcessingService?Request=Execute&service=WPS&version=1.0.0&language=en-CA&Identifier=com.terradue.wps_hadoop.examples.intersection.IntersectionHadoopAlgorithm&DataInputs=Polygon1=http://www.fao.org/figis/geoserver/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=fifao:FAO_MAJOR&CQL_FILTER=fifao:F_AREA=21;Polygon2=http://geo.vliz.be/geoserver/Marbound/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=Marbound:eez&CQL_FILTER%3DINTERSECTS%28Marbound%3Athe_geom%2CPOLYGON%28%28-82.410003662+34.830001831%2C-42+34.830001831%2C-42+78.166666031%2C-82.410003662+78.166666031%2C-82.410003662+34.830001831%29%29%29

Before to submit this kind of request you need to encode it in this way:

http://localhost:8888/wps/WebProcessingService?Request=Execute&service=WPS&version=1.0.0&language=en-CA&Identifier=com.terradue.wps_hadoop.examples.intersection.IntersectionHadoopAlgorithm&DataInputs=Polygon1=http%3A%2F%2Fwww.fao.org%2Ffigis%2Fgeoserver%2Fows%3Fservice%3DWFS%26version%3D1.0.0%26request%3DGetFeature%26typeName%3Dfifao%3AFAO_MAJOR%26CQL_FILTER%3Dfifao%3AF_AREA%3D21;Polygon2=http%3A%2F%2Fgeo.vliz.be%2Fgeoserver%2FMarbound%2Fows%3Fservice%3DWFS%26version%3D1.0.0%26request%3DGetFeature%26typeName%3DMarbound%3Aeez%26CQL_FILTER=INTERSECTS%28Marbound%3Athe_geom%2CPOLYGON%28%28-82.410003662+34.830001831%2C-42+34.830001831%2C-42+78.166666031%2C-82.410003662+78.166666031%2C-82.410003662+34.830001831%29%29%29


More information

This algorithm would be only a very simple translation of the original IntersectionAlgorithm developed by 52North. This templete was sent and discussed with FAO (Emmanuel Blondel) to be improved according to their requirements.