Blog
what did i learn today
[oracle] avoiding SLOW sdo_aggr_union

There is this recurring problem we have in GIS: getting road-segments and wanting to show complete roads. The naive approach would we to do something like the following:

insert into street_geoms
select ro.rd_ro_ident, ro.rd_ro_name, ro.com_code, ssdo_aggr_union(sdoaggrtype(rd.ro_geometry, 0.005)) as geom
from rd_road ro, rd_ro_sec ros
where ros.rd_ro_ident = ro.rd_ro_ident
group by ro.rd_ro_ident, ro.rd_ro_name, ro.com_code;

For good measure: we have 45.000+ roads, with a total of 230.000+ road segments. So when that query starts running and starts taking a long time, I started googling. Apparently there are two faster alternatives: SDO_AGGR_CONCAT_LINES and SDO_AGGR_SET_UNION. While the first was really quick, completed in minutes, the result was completely wrong (complete segments were missing). The second might be quicker, but it was really hard to get an idea about any progress, and if it would fail, everything should be lost (rolled back).

So I decided to write a little script, and issue a sql statement for each single road, allowing me to track progress and added restartibility. For each road I issued a statement like:

insert into street_geoms
select ro.rd_ro_ident, ro.rd_ro_name, ro.com_code, sdo_aggr_set_union(CAST(COLLECT(ros.rd_ros_geometry) AS mdsys.SDO_Geometry_Array),0.005) as geom
from rd_road ro, rd_ro_sec ros
where ros.rd_ro_ident = ro.rd_ro_ident
  and ro.rd_ro_ident = 1895101 
group by ro.rd_ro_ident, ro.rd_ro_name, ro.com_code;

I added some ruby code around it, to make sure it tracked the progress and calculated the remaining time, just to have an idea. The first "large" road it stumbled upon literally took hours. It only had to join 39 segments. A simple query learned I had 150+ roads with more segments, and a maximum of 125 segments in the database. I could not just simply ignore them :) So this was not going to work either.

Why would this be so hard? I just wanted to throw all linestrings together into one geometry. How could I do that? Querying the geometries was really easy, so what if I joined the geometries outside of oracle? And wouldn't that be hard? But there is a simple solution: convert the strings to WKT, and join all LINESTRING in a MULTILINESTRING. This would just be simple string manipulation. I can do that ;)

I had some hiccups with this approach: handling the long strings proved a bit akward (use CLOB instead) and I had to regularly call GC.start to make sure the open cursors were released. And I had to make sure not to build a string literal which was too long (ORA-06550).

But in the end I was able to join the road-sections for the 45.000 + roads in approx 1.5h, which is not blindingly fast, but faster than 1 single SDO_AGGR_SET_UNION operation :) :)

For reference you can see the full code:

class StreetGeom < ActiveRecord::Base
  self.primary_key = 'rd_ro_ident'
end

def format_time (t)
  t = t.to_i
  sec = t % 60
  min = (t / 60) % 60
  hour = t / 3600
  sprintf("% 3d:%02d:%02d", hour, min, sec)
end

def eta(count)
  if count == 0
    "ETA: --:--:--"
  else
    elapsed = Time.now - @start_time
    # eta = elapsed * @total / count - elapsed;
    eta = (elapsed / count) * (@total - count)

    sprintf("ETA: %s", format_time(eta))
  end
end

all_roads = Road.count
geoms_to_calculate = all_roads - StreetGeom.count
@total = geoms_to_calculate

puts "Joining geometries for #{all_roads} roads [still #{geoms_to_calculate} to do]"

cntr = 1
@start_time = Time.now

done = 0

Road.order(:rd_ro_ident).each do |road|
  street_count = StreetGeom.where(rd_ro_ident: road.rd_ro_ident).count
  print "\rConverting #{cntr}/#{all_roads} [#{eta(done)}] "
  if street_count == 0
    print "..."
    $stdout.flush

    ## get all geometries in WKT format
    get_geoms_sql = <<-SQL
      select sdo_cs.make_2d(ros.rd_ros_geometry).get_wkt() as wkt_geom from rd_ro_sec ros where ros.rd_ro_ident = #{road.rd_ro_ident}
    SQL

    cursor = Road.connection.execute(get_geoms_sql)

    line_strings = []

    while row = cursor.fetch
      line_string = row[0].read.to_s
      line_strings << line_string[10..-1]
    end

    insert_sql = <<-SQL
      DECLARE
        wkt_str clob;
      BEGIN
        wkt_str := 'MULTILINESTRING(#{line_strings.join(", ';\nwkt_str := wkt_str || '")})';
        insert into street_geoms(rd_ro_ident, name, com_code, geom)
        values (#{road.rd_ro_ident}, q'[#{road.rd_ro_name}]', '#{road.com_code}',
             sdo_util.from_wktgeometry(to_clob(wkt_str)) );
      END;
    SQL

    Road.connection.execute(insert_sql)
    done += 1
  else
    print "_"
  end

  cntr += 1

  # periodically cleanup GC so we release open cursors ...
  # to avoid ORA-1000 errors
  if (cntr % 50) == 0
    GC.start
  end
end

print "\n"
puts "\n\nDone!"

and I run this script in the rails environment as follows: rails runner lib\tasks\join_road_geometries.rb.

We are in the process of migrating an old GIS system. For our new systems we use POSTGIS. But this one still uses oracle. The data is spanning two countries: Belgium and the Netherlands. Our system does something awful: all data is stored in RD (the dutch coordinate system, using Oracle SRID 90112).

So how do we get data into the system: belgian data is entered as Lambert 72 (oracle srid 327680) and then transformed to 90112.

Our client uses a customised viewer that shows the data either in RD or Lambert72. Now we want to switch to a more generic solution, and show the data in WGS84. We are using oracle 11, so my initial process was the following

  • extract belgian data from tables, convert back to 327680 (SDO_CS.transform(geom, 327680))
  • set the SRID to 31370 (which is the correct/best srid for belgium --it has the correct transformation to wgs84) as follows: update be_geoms bg set bg.geom.sdo_srid = 31370 (so without transformation)
  • for dutch data I just set it to 28992
  • and then I transform both to WGS!

Easy! done! ready! However ... I was not ... The data was not positioned correctly. So I checked the definition in MDSYS.CS_SRS for both 28992 and 31370 and compared it to epsg.io and lo and behold: both where wrong. So now I had to update them.

Updating EPSG:31370

delete from mdsys.cs_srs where srid=31370;
Insert into MDSYS.CS_SRS (CS_NAME,SRID,AUTH_SRID,AUTH_NAME,WKTEXT,CS_BOUNDS,WKTEXT3D) values ('Belge 1972 / Belgian Lambert 72',31370,31370,'IGN Brussels www.ngi.be/html-files/french/0038.html','PROJCS["Belge 1972 / Belgian Lambert 72", GEOGCS ["Belge 1972", DATUM ["Reseau National Belge 1972 (EPSG ID 6313)", SPHEROID ["International 1924 (EPSG ID 7022)", 6378388.0, 297.0], -106.869,52.2978,-103.724,0.3366,-0.457,1.8422,-1.2747], PRIMEM ["Greenwich", 0.000000], UNIT ["Decimal Degree", 0.0174532925199433]], PROJECTION ["Lambert Conformal Conic"], PARAMETER ["Latitude_Of_Origin", 90.0], PARAMETER ["Central_Meridian", 4.3674866666666667], PARAMETER ["Standard_Parallel_1", 51.1666672333333333], PARAMETER ["Standard_Parallel_2", 49.8333339], PARAMETER ["False_Easting", 150000.013], PARAMETER ["False_Northing", 5400088.438], UNIT ["Meter", 1.0]]',null,'PROJCS[
  "Belge 1972 / Belgian Lambert 72",
  GEOGCS["Belge 1972",
    DATUM["Reseau National Belge 1972",
      SPHEROID[
        "International 1924",
        6378388.0,
        297.0,
        AUTHORITY["EPSG", "7022"]],
      TOWGS84[-106.869,52.2978,-103.724,0.3366,-0.457,1.8422,-1.2747],
      AUTHORITY["EPSG", "6313"]],
    PRIMEM["Greenwich", 0.000000, AUTHORITY["EPSG","8901"]],
    UNIT["degree (supplier to define representation)", 0.0174532925199433, AUTHORITY["EPSG", "9122"]],
    AXIS["Lat", NORTH],
    AXIS["Long", EAST],
    AUTHORITY["EPSG", "4313"]],
  PROJECTION ["Lambert Conformal Conic"],
  PARAMETER ["Latitude_Of_Origin", 90.0],
  PARAMETER ["Central_Meridian", 4.3674866666666667],
  PARAMETER ["Standard_Parallel_1", 51.1666672333333333],
  PARAMETER ["Standard_Parallel_2", 49.8333339],
  PARAMETER ["False_Easting", 150000.013],
  PARAMETER ["False_Northing", 5400088.438],
  UNIT["metre", 1.0, AUTHORITY["EPSG", "9001"]],
  AXIS["X", EAST],
  AXIS["Y", NORTH],
  AUTHORITY["EPSG", "31370"]]');

... and this worked and now my transformation for Lambert is correct!

Updating EPSG:28992

... proved to be a little trickier. I assumed I could just reuse the same method as for the belgian coordinate system (yes, I know, assume = ass-u-me).

I was unable to just delete or update 28992 because I got an error that a child record existed: ORA-02292 with reason COORD_OPERATION_FOREIGN_SOURCE. Googling this revealed nothing at all.

So I had to dig deeper. And deeper. Actually MDSYS.CS_SRS is actually a view which tries to update the underlying tables. And the TOWGS84 coordinates, which I had to change/update, are stored in SDO_DATUM. So after some searching, it actually proved to be quite easy. To updated the EPSG:28992, I just had to do:

update mdsys.sdo_datums set
  shift_x = 565.417,
  shift_y = 50.3319,
  shift_z = 465.552,
  rotate_x = -0.398957,
  rotate_y = 0.343988,
  rotate_z = -1.8774,
  scale_adjust = 4.0725
where datum_id = 6289;

EXECUTE SDO_CS.UPDATE_WKTS_FOR_EPSG_DATUM(6289);

My first initial (naive) assumption was that the SDO_CS.UPDATE_... functions would actually retrieve the latest EPSG definitions, unfortunately no such luck :) :)

Stuff like this makes me appreciate PostGIS even more.

I have a very weird problem with my geoserver+oracle, when deployed on a Windows 2012R2 server (see here), and in attempting to solve that, I upgraded the geoserver from 2.6.3 to 2.7.1, hoping that that would fix that.

Sometimes fairy tales come true, but in this case it did not help, unfortunately. The 2.7.1 did render a lot quicker, except one layer which did not render at all anymore.

My style could not render with the error The requested Style can not be used with this layer. The style specifies an attribute of <missing attribute name>. Checking the layer in geoserver, I could see it was no longer to determine any of the attributes for the given table.

Further investigation in the logfile revealed the following (cryptic) error:

Failure occurred while looking up the primary key with finder: org.geotools.jdbc.HeuristicPrimaryKeyFinder@24cf7139

java.sql.SQLException: Exhausted Resultset

Mmmmmm. Luckily my google-fu revealed a linked issue, and simple solution:

update the driver from ojdbc14.jar to the newer ojdbc7.jar fixes this problem.

Hehe :)

Updating geoserver did not fix my problem: my layer still had some duplicate columnnames. This might not be such a big problem: everything is drawn correctly, WMS calls work, but WFS calls gave the irritating yet predictable error ORA-00918: column ambiguously defined. Annoying.

So how does one find column-names for a table in oracle? With a query like:

select * from dba_tab_columns where table_name = 'YOUR_TABLE_NAME';

and all of a sudden I saw the same set of column-names, with some duplication. Apparently my oracle database contains the table twice, in two different schema's. Since my user had the permissions to access the other schema, it seems geoserver does not limit the query to the (specified) schema at all.

The fix then was easy: make the other schema unaccessible. In my case the second schema was for testing purposes, so I could just delete it.

[oracle] changing the value of a sequence

According to oracle documentation, to change a value of a sequence, you have to drop and recreate it, using the following command:

CREATE SEQUENCE table_name_seq START WITH 12345;

But there are some easy ways to change the value of an existing sequence too.

If you want to increment the current value by 500, you can just use

select your_sequence_name.nextval from dual connect by level <= 500;

If you want to decrement it, you can do that as follows:

alter sequence id_sequence increment by -500;
select id_sequence.nextval from dual;
alter sequence id_sequence increment by 1;

(of course this can also be used to increment it, but the connect by level trick is easier then)

I am currently converting an Oracle database to Postgis. Instead of blindingly copying the data model, I am also checking which columns are actually really used, and drop those that are never used.

In most tables it is pretty easy, I can do a quick visual check and then count that one column that seems to be zero all the time. But we have a few tables with 30-50 columns.

For such a table there is also an easy way:

SELECT t.column_name
FROM user_tab_columns t
WHERE t.nullable = 'Y'
    AND t.table_name = 'YOUR_TABLE_NAME_HERE'
    AND t.num_distinct = 0

Mind you, for this to work, your database must have gathered the statistics (if you haven't done this before, this will also help your performance).

BEGIN
  DBMS_STATS.gather_database_stats();
END;

When using Rails on top of an Oracle database, you use the oracle enhanced activerecord adapter. This adapter (or Rails) has one weird side-effect: there is no way to avoid using a sequence when inserting a new record. But, in this comment, Raimonds Simanovskis hints at the solution. In an initializer you write: [ruby] # a small patch as proposed by the author of OracleEnhancedAdapter: http://blog.rayapps.com/2008/05/13/activerecord-oracle-enhanced-adapter/#comment-240 # if a ActiveRecord model has a sequence with name "autogenerated", the id will not be filled in from any sequence ActiveRecord::ConnectionAdapters::OracleEnhancedAdapter.class_eval do alias_method :orig_next_sequence_value, :next_sequence_value def next_sequence_value(sequence_name) if sequence_name == 'autogenerated' # we assume id must have gotten a good value before insert! id else orig_next_sequence_value(sequence_name) end end end [/ruby] ... and in your model you add: [ruby] set_sequence_name 'autogenerated' [/ruby]

Spatial DB Advisor

PL/SQL package: lessons learned

Working with Oracle Spatial a lot, I found a great source of information in the Spatial DB Advisor, which besides a lot of interesting articles, also offers his source-code and PL/SQL packages for free. The site is not very user-friendly (trouble of seeing the threes through the wood), as finding the free packages is something that i only succeed in coming from Google :) A shortcut: you can find them here. But a word of warning: if you try to install this package in an existing schema, it will delete all indexes and tables. This is something i had to discover the hard way (painful i might add). So i have adapted my version of create_test_data.sql, where at the top of the file all indexes and tables of the user are dropped. Nevertheless, extremely useful package. I will list my favourites (for now):

  • isCompound: checks whether a geometry contains any circular/arc elements
  • ConvertGeometry: converts any special elements - circulararcs, rectangles, circles - in a geometry to vertex to vertex linestrings
  • to_2d: converts a sdo_geometry to 2d. Very useful for us, as geoserver can't draw 2d, and we only use 2,5d anyway. Meaning that we always use a topview, and the z is the elevation level of the ground-level. Or for pipelines: below ground :)
  • sdo_mbr: function to determine the minimum bounding rectangle! i will use this to adapt my script to fill user_sdo_geom_metadata. Cool :)
  • functions to investigate the data of sdo_geometry (number of rings, number of vertexes, number of coordinates, the coordinates itself, ...)
  • functions to investigate the meta-data
  • functions to manipulate your sdo-geometry: scale, rotate, affine transformations, move, adding and removing points, ...
  • ... and lots more ... In short: awesome!! So after getting a terrifying scare, now rebuilding the databases (and adapted the script: won't happen to me again). I mailed the original creator to add a note of warning on his site too. Might help other absent-minded developers :) I also had another smaller problem, because the install-script overruled my ORACLE_HOME and PATH definition, which made sure SQL*PLUS was not found. But that was easily cured. I am very grateful that he puts up this package for free. And i learned another great lesson today: not to get too enthusiastic before i am sure that it all works ;)

I am now working in Ruby on Rails for about a month. I have done a few smaller things before, investigating, trying out; but now I am really building a full application site in Rails! It feels great :) But of course, just starting out in such a new environment, both ruby and rails, I encountered a lot of initial problems. One of them is showing international, accented characters, like é, è, ê ... from the database. Nor Rails nor irb could show the correct characters. I really needed to solve this problem, so this started an investigative journey.

The symptoms

When displaying a table of results from Rails in any browser, all accented characters were shown as white question marks inside a black rhombus (diamond?). When i tested this with irb i also got very weird results. When using TOAD the results of a query are displayed correctly. But maybe TOAD just does it very clever? If i use the GUI version of SQL*Plus (sqlplusw) the results are also shown correctly. But when i use the console (command-line) version of sqlplus, the characters were also distorted. So maybe the clue could be found there.

Oracle NLS-LANG

No matter what character-set the Oracle database uses to store the data (e.g. UTF-8, UTF-16) the data is always converted to the clients character set. This is truly an amazing feature. This depends on a setting called NLS_LANG. Not sure wat it quite stands for. NLS_LANG is built up as follows:NLS_LANG=language_territory.charsetThe NLS_LANG property is set by default in the registry to the correct Windows codepage.NLS_LANG="AMERICAN_AMERICA.WE8MSWIN1252"But why doesn't it work inside the command line console? Apparently because it used a different codepage. If you type chcp (requesting the codepage) at the DOS prompt, it would normally return 437 (it did on my machine). So you would need to enter`

set NLS_LANG=american_america.US8PC437 require 'oci8' OCI8.new('user','pw','db').exec('select * from emp') do |r| puts r.join('|'); end `in your irb and then all results would be displayed correctly. The crucial line being the correct setting of NLS_LANG. Wow! I got my results correctly in ruby! Now i was in the assumption that Rails would be a piece of cake, but that was wrong.

Fixing Rails

The easy idea would be to set NLS_LANG in Rails correct, before the oci8-library is required. My first approach was to set the NLS_LANG in the first line of the environment.rb with the following line:ENV["NLS_LANG"] = "AMERICAN_AMERICA.WE8MSWIN1252"But this didn't work. I am using NetBeans 6.5, and it took me a while to realise that if I edited a file to contain special characters (fixed text, e.g. on a menu) it would work. NetBeans (or Rails for that matter) standard works with UTF-8. So all files are encoded in that way. The easy solution would be to useENV["NLS_LANG"] = "AMERICAN_AMERICA.AL32UTF8"But that didn't work. I tried the alternative AMERICAN_AMERICA.UTF8 but that didn't work either. I am just guessing here, but i think somehow the NLS_LANG setting didn't get picked up in the Rails environment. It kept using the registry setting. So I tried to turn it around: make Rails use the windows codepage instead of utf-8. This took several steps:

  • change the default setting inside NetBeans to use the correct code-page instead of UTF-8. This in fact only affects the format of the files that are saved, but it is nevertheless important that all files being served are in the same format. Also convert all previously edited files from UTF-8 to standard ASCII. I used an editor to do that.

  • added the following to application_controller.rb:` before_filter :headers_iso

    def headers_iso

    make sure the charset matches the default Oracle NLS setting

    headers["content-type"]= "text/html; charset=windows-1252" end `

  • added the correct META-tag to application.rhtml in the HEAD-section:``Note: this meta-tag is not really needed, it has no real effect. And that finally worked! :)

ORA-01031 error when creating database

At my new job I am back to administering Oracle. The last time I had to do any dba-job, they were still using Oracle 7 :) At my current job, the IT-department can be summed up in one word: me! So, now i have to do everything again. So today, i was trying to get a database installed on a laptop left behind, with no passwords. I was able to solve that, luckily. I had a script to create the database in Oracle 8, had to port it to 9. The biggest change there was: instead of svrmgrl you have to use sqlplus. So i could run the script, but kept getting ORA-01031 error (insufficient privileges). At first: didn't use a sys-account, then lost password of sys-account (set it again using Enterprise Manager), logon using sys-account, but my network user had no ora_dba rights. Logon with a local user with adminstrator rights, put him in the right group, and then it still didn't work ... And at those times, I am so happy there exists this something called internet. I found my solution here. I would never have thought of that. Go figure: a whole day spent trying to create a database. MS SQL Server or MySQL seem so much easier. Industry leader or not, you would suppose they could make equally simple. Or maybe i just haven't found it yet. Don't mention import and export. That is also a pain. But once it is runnning ... :)

Oracle 10 Troubles --continued

We had more than a few troubles migrating to Oracle 10, in our production environment. We are using a clustered server, with 3 nodes, so it should be blindingly fast. I'll sum up our biggest issues (we had) :

  1. Sequences : we used a sequence to make sure the messages we send, can be ordered. Somehow our sequence was set to "NO ORDER", but in a single-node environment this always works. In our environment, we suddenly go gaps, leaps, depending on which node you were connected to. This seemingly unexplainable behaviour (sequences are always in order, so the error must be something else), was luckily quickly discovered (are they really?), and easily fixed.
  2. ORA-01483 : all our c++ server software, running on NT, is built using C++ Builder and BDE. We have had troubles like hell with "max open cursor" problems. So, fix, or workaround : count maximum open cursors, if this approaches the database maximum, re-connect to the database. Dirty, but it works. So, in Oracle 8, we had upped the SESSION_CACHED_CURSORS to 250. When we did the same in the Oracle 10g, suddenly, out of the blue appeared the ORA-01483 error. Whammo !! It took us quite a while to figure out what exactly caused the errors. Resetting the aforementioned to it's default value fixed that problem.
  3. Database drop-out : every night our database just crashes, hangs. This is related to "open cursors problem". Our applications count the amount of open cursors using the v$-tables, which apparently after a while causes Oracle to hang. Here is the query which correctly counts the open cursors given the program name : [sql] select distinct a.value, s.sid, s.machine, s.program, s.inst_id, s.server from gv$sesstat a, gv$statname b, gv$session s where a.statistic# = b.statistic# and b.name = 'opened cursors current' and s.sid = a.sid and s.inst_id = a.inst_id and upper(s.program) like '[program-name]%' order by 1 desc [/sql] The query we used before, was : [sql]select count(*) from v$open_cursor where sid=:sid[/sql] To solve this, i changed the server-process causing the database to hang each night, to use DOAinstead of BDE. Apparently this also alleviates our open cursor problem. I did notice that open cursors, using DOA, are only really closed in Oracle after a commit. This is taken into production tonight, let's hope it will finally improve our databases up-time. For the moment we manually stop and start our database twice a day, in a controlled way, in the hope our database can keep up for the rest of the time. Oracle are looking into our case, i am not quite sure why our setup is different than the rest of the world. It has been a very hectic two weeks. Finally we are reaching a somewhat stable environment, although it still requires a lot of manual intervention to keep it running smoothly.