Blog
what did i learn today
[oracle] avoiding SLOW sdo_aggr_union

There is this recurring problem we have in GIS: getting road-segments and wanting to show complete roads. The naive approach would we to do something like the following:

insert into street_geoms
select ro.rd_ro_ident, ro.rd_ro_name, ro.com_code, ssdo_aggr_union(sdoaggrtype(rd.ro_geometry, 0.005)) as geom
from rd_road ro, rd_ro_sec ros
where ros.rd_ro_ident = ro.rd_ro_ident
group by ro.rd_ro_ident, ro.rd_ro_name, ro.com_code;

For good measure: we have 45.000+ roads, with a total of 230.000+ road segments. So when that query starts running and starts taking a long time, I started googling. Apparently there are two faster alternatives: SDO_AGGR_CONCAT_LINES and SDO_AGGR_SET_UNION. While the first was really quick, completed in minutes, the result was completely wrong (complete segments were missing). The second might be quicker, but it was really hard to get an idea about any progress, and if it would fail, everything should be lost (rolled back).

So I decided to write a little script, and issue a sql statement for each single road, allowing me to track progress and added restartibility. For each road I issued a statement like:

insert into street_geoms
select ro.rd_ro_ident, ro.rd_ro_name, ro.com_code, sdo_aggr_set_union(CAST(COLLECT(ros.rd_ros_geometry) AS mdsys.SDO_Geometry_Array),0.005) as geom
from rd_road ro, rd_ro_sec ros
where ros.rd_ro_ident = ro.rd_ro_ident
  and ro.rd_ro_ident = 1895101 
group by ro.rd_ro_ident, ro.rd_ro_name, ro.com_code;

I added some ruby code around it, to make sure it tracked the progress and calculated the remaining time, just to have an idea. The first "large" road it stumbled upon literally took hours. It only had to join 39 segments. A simple query learned I had 150+ roads with more segments, and a maximum of 125 segments in the database. I could not just simply ignore them :) So this was not going to work either.

Why would this be so hard? I just wanted to throw all linestrings together into one geometry. How could I do that? Querying the geometries was really easy, so what if I joined the geometries outside of oracle? And wouldn't that be hard? But there is a simple solution: convert the strings to WKT, and join all LINESTRING in a MULTILINESTRING. This would just be simple string manipulation. I can do that ;)

I had some hiccups with this approach: handling the long strings proved a bit akward (use CLOB instead) and I had to regularly call GC.start to make sure the open cursors were released. And I had to make sure not to build a string literal which was too long (ORA-06550).

But in the end I was able to join the road-sections for the 45.000 + roads in approx 1.5h, which is not blindingly fast, but faster than 1 single SDO_AGGR_SET_UNION operation :) :)

For reference you can see the full code:

class StreetGeom < ActiveRecord::Base
  self.primary_key = 'rd_ro_ident'
end

def format_time (t)
  t = t.to_i
  sec = t % 60
  min = (t / 60) % 60
  hour = t / 3600
  sprintf("% 3d:%02d:%02d", hour, min, sec)
end

def eta(count)
  if count == 0
    "ETA: --:--:--"
  else
    elapsed = Time.now - @start_time
    # eta = elapsed * @total / count - elapsed;
    eta = (elapsed / count) * (@total - count)

    sprintf("ETA: %s", format_time(eta))
  end
end

all_roads = Road.count
geoms_to_calculate = all_roads - StreetGeom.count
@total = geoms_to_calculate

puts "Joining geometries for #{all_roads} roads [still #{geoms_to_calculate} to do]"

cntr = 1
@start_time = Time.now

done = 0

Road.order(:rd_ro_ident).each do |road|
  street_count = StreetGeom.where(rd_ro_ident: road.rd_ro_ident).count
  print "\rConverting #{cntr}/#{all_roads} [#{eta(done)}] "
  if street_count == 0
    print "..."
    $stdout.flush

    ## get all geometries in WKT format
    get_geoms_sql = <<-SQL
      select sdo_cs.make_2d(ros.rd_ros_geometry).get_wkt() as wkt_geom from rd_ro_sec ros where ros.rd_ro_ident = #{road.rd_ro_ident}
    SQL

    cursor = Road.connection.execute(get_geoms_sql)

    line_strings = []

    while row = cursor.fetch
      line_string = row[0].read.to_s
      line_strings << line_string[10..-1]
    end

    insert_sql = <<-SQL
      DECLARE
        wkt_str clob;
      BEGIN
        wkt_str := 'MULTILINESTRING(#{line_strings.join(", ';\nwkt_str := wkt_str || '")})';
        insert into street_geoms(rd_ro_ident, name, com_code, geom)
        values (#{road.rd_ro_ident}, q'[#{road.rd_ro_name}]', '#{road.com_code}',
             sdo_util.from_wktgeometry(to_clob(wkt_str)) );
      END;
    SQL

    Road.connection.execute(insert_sql)
    done += 1
  else
    print "_"
  end

  cntr += 1

  # periodically cleanup GC so we release open cursors ...
  # to avoid ORA-1000 errors
  if (cntr % 50) == 0
    GC.start
  end
end

print "\n"
puts "\n\nDone!"

and I run this script in the rails environment as follows: rails runner lib\tasks\join_road_geometries.rb.

Spatial DB Advisor

PL/SQL package: lessons learned

Working with Oracle Spatial a lot, I found a great source of information in the Spatial DB Advisor, which besides a lot of interesting articles, also offers his source-code and PL/SQL packages for free. The site is not very user-friendly (trouble of seeing the threes through the wood), as finding the free packages is something that i only succeed in coming from Google :) A shortcut: you can find them here. But a word of warning: if you try to install this package in an existing schema, it will delete all indexes and tables. This is something i had to discover the hard way (painful i might add). So i have adapted my version of create_test_data.sql, where at the top of the file all indexes and tables of the user are dropped. Nevertheless, extremely useful package. I will list my favourites (for now):

  • isCompound: checks whether a geometry contains any circular/arc elements
  • ConvertGeometry: converts any special elements - circulararcs, rectangles, circles - in a geometry to vertex to vertex linestrings
  • to_2d: converts a sdo_geometry to 2d. Very useful for us, as geoserver can't draw 2d, and we only use 2,5d anyway. Meaning that we always use a topview, and the z is the elevation level of the ground-level. Or for pipelines: below ground :)
  • sdo_mbr: function to determine the minimum bounding rectangle! i will use this to adapt my script to fill user_sdo_geom_metadata. Cool :)
  • functions to investigate the data of sdo_geometry (number of rings, number of vertexes, number of coordinates, the coordinates itself, ...)
  • functions to investigate the meta-data
  • functions to manipulate your sdo-geometry: scale, rotate, affine transformations, move, adding and removing points, ...
  • ... and lots more ... In short: awesome!! So after getting a terrifying scare, now rebuilding the databases (and adapted the script: won't happen to me again). I mailed the original creator to add a note of warning on his site too. Might help other absent-minded developers :) I also had another smaller problem, because the install-script overruled my ORACLE_HOME and PATH definition, which made sure SQL*PLUS was not found. But that was easily cured. I am very grateful that he puts up this package for free. And i learned another great lesson today: not to get too enthusiastic before i am sure that it all works ;)

When storing spatial data into Oracle, there are a few steps one needs to complete (as shown here too):

  • insert the data (obviously)
  • update the USER_SDO_GEOM_METADATA table. This table specifies for each column the bounding box and the SRID (coordinate system).
  • create a spatial index I have created a script to this automatically for me, once all tables are filled with their data (e.g. after import or after using FME to import your data). [sourcecode language="sql"] set serverout on DECLARE schema_orig varchar2(100) := upper('&user_orig'); CURSOR ctab is select TABLE_NAME from sys.dba_tables where owner = schema_orig; tn sys.dba_tab_columns.TABLE_NAME%TYPE ; CURSOR ctabcol is select column_name,DATA_TYPE from sys.dba_tab_columns where owner = schema_orig and table_name = tn ; collist varchar2(2000) ; query varchar2(2000) ; column_name varchar2(200); has_geometric_column boolean; col_count number; BEGIN -- dbms_output.enable(1000000); FOR ctabrec IN ctab LOOP tn := ctabrec.table_name ; has_geometric_column := false; col_count := 0; for ctabcolrec IN ctabcol LOOP if ctabcolrec.data_type = 'SDO_GEOMETRY' then has_geometric_column := true; col_count := col_count + 1; column_name := ctabcolrec.column_name; -- insert data into user_sdo_geom query := 'INSERT INTO USER_SDO_GEOM_METADATA (TABLE_NAME, COLUMN_NAME, DIMINFO, SRID) VALUES ('''||tn|| ''' , '''|| COLUMN_NAME ||''', MDSYS.SDO_DIM_ARRAY( MDSYS.SDO_DIM_ELEMENT(''X'', 0, 10000000, 0.000005),MDSYS.SDO_DIM_ELEMENT(''Y'', 0, 10000000, 0.000005), MDSYS.SDO_DIM_ELEMENT(''Z'', 0,10000000, 0.000005)), &srid)'; dbms_output.put_line(query); BEGIN execute immediate query; EXCEPTION WHEN OTHERS THEN dbms_output.put_line('**** failed to insert row into user_sdo_geom_metadata for '||tn||'('||column_name||')'); --rollback; ignore any errors! there will be existing columns END; -- create spatial index query := 'CREATE INDEX '||tn||'_'||to_char(col_count)||'_SX ON '||tn||'('|| COLUMN_NAME ||') INDEXTYPE IS MDSYS.SPATIAL_INDEX'; dbms_output.put_line(query); BEGIN execute immediate query; EXCEPTION WHEN OTHERS THEN dbms_output.put_line('**** failed to create spatial index for '||tn||'('||column_name||')'); --rollback; ignore any errors END; end if; END LOOP; END LOOP; commit; END; / [/sourcecode] The script has two parameters: the schema from where to scan all tables, and the srid (coordinate system) which needs to be filled in. Any thoughts on this? suggestions?