the gabriellephant

10 May, 2024

PGSQL Phriday #016: Query Tuning

by gorthx

For PGSQL Phriday #016, Ryan’s asked us to “…discuss your process for tuning difficult queries. Specifically, try to focus on that one problematic query that really challenged you and you always use it as an example when helping or teaching others your methods.”

(Yes, I am two months late publishing this.)

Here are the generic steps I take, mostly in order.

Pause and reflect on the holy trinity of troubleshooting questions: what did you expect to happen, what actually happened, and how long has it been acting this way?
Just like working on any other codebase, check the query into source control so you can track your changes.
Run it through pg_format (https://github.com/darold/pgFormatter) for readability. This is especially useful if multiple people have worked on the query and brought their own ideas about SQL style.
Verify that the tables used in the query have been ANALYZEd recently.
Create a few passing and failing tests. This will help verify that you are still getting the results you expect. I use PgTAP (https://pgtap.org/).
Give the query a quick scan for obvious things like LIMIT without ORDER BY, DISTINCT on columns that already have unique constraints, and UNION where a UNION ALL would work.
If I can get the output of EXPLAIN (ANALYZE, BUFFERS, TIMING) from a production or production-like system, great. (Don’t forget to run it in a transaction if it will alter data.) Plug it into https://explain.depesz.com, and look for wildly differing estimated vs actual row counts, loops > 1 on large datasets, and any red areas. All of those are good starting points.*
Unless it’s a fairly small query to start with, break it into its component pieces. CTEs, subselects, UNIONs are all good breakpoints. This is a critical step to finding slow points if I don’t have access to EXPLAIN output.
For each piece, review the FROM clause first, since it’s first in SQL order of operations. Check for tables which have no columns in the SELECT statement. If it’s not needed as a join table, try removing it. (This is where it’s crucial to have reliable tests.)

After that I just make it up as I go.

As a specific example, there’s a query that certainly wasn’t the worst I’ve ever worked on, but it was the most memorable, because I learned about join_collapse_limit.

It was part of a migration from another RDBMS, and took 20 minutes to return a hundred or so rows on Postgres. It featured multiple UNIONs, and each query had INNER and OUTER JOINs on a dozen+ tables. Throw in very similarly-named tables (imagine “service_server_servers”, “service_servers_server”, “servers_service_servers” etc), aliases that bore no apparent relation to their table names, and multiple different SQL styles, and I knew immediately I would be entertained for days.

I had a couple of things going for me: 1) a fixed dataset (so I could be sure I wasn’t getting incorrect values back) and 2) the query was written by actual humans who were really invested in making it better.

I’d gone through my usual steps, I’d set up a whiteboard and used every color of dry erase marker I owned to diagram the JOINs, and had removed some unneeded JOINs and converted some subqueries to JOINs. This shaved off a few minutes, but didn’t provide the gains I was hoping for.

Looking at the EXPLAIN plan for one of the UNIONed queries, the working dataset from the first two joined tables was several hundred thousand records. I checked pg_stats for the various tables and query parameters and thought that if I re-ordered the JOINs to select a smaller dataset first, that would limit the size of the subsequent datasets. But it didn’t have any reliably positive effect.

A coworker (thanks, Jeff!) tipped me off to join_collapse_limit; if you’re JOINing more than join_collapse_limit tables, the planner won’t try every single potential plan, because it’s not a good use of system resources. The default value is 8.

I set join_collapse_limit to 1 in my session (don’t set this globally) to force Postgres to respect my written JOIN order. This made a significant improvement in speed. I’ve use this method successfuly a few times since then, but note that if the distribution of data changes significantly, you may need to re-order the joins.

* Henrietta Dombrovskaya’s book is a good reference to learn about different scan and join types; this is helpful for interpreting EXPLAIN plans beyond these three tips. https://www.goodreads.com/book/show/57285570-postgresql-query-optimization

Posted in database administration, PostgreSQL | 1 Comment »

Tags: PostgreSQL, query tuning

5 May, 2024

LinuxFest Northwest PostgreSQL Booth Report

by gorthx

Last weekend, Jenn and I headed to Bellingham for LinuxFest Northwest. It was my first time back since about 2015, and I’ve really missed attending this laid-back, 100% volunteer-run, free to attend event. We had a lot of good conversations with folks coming by the booth.

Some stats:

About a dozen vendor booths
Farthest traveler: Shaun/Shawn/Sean from Australia.
Requests for PostgreSQL talks for next year’s event: 4
Students using Postgres in their classes and personal projects: … a lot. (The venue is a technical college.) “I’m using Postgres for my senior project!”
Asked about change data capture: 3
Asked about migrating from SQL Server to Postgres: 4
Asked about migration from Postgres to MySQL: 1 (yes, that’s right – they are sad about it)
Just happy you’re here, I love Postgres: 13

I’m glad to see LFNW back on their feet as we come out of this pandemic, and hope this conference continues to grow.

Posted in Conferences, PostgreSQL | 1 Comment »

Tags: community, conferences, lfnw, PostgreSQL

25 March, 2024

SCaLE PostgreSQL Booth stats

by gorthx

Last month I attended the SoCalLinuxExpo for the first time since 2020. We hosted our PostgreSQL@SCaLE event as well as a vendor booth.

Random booth visitor stats:

About a third to a half just came by to say how much they like Postgres 😊
Asked for the Chelnik crochet pattern: 2
Students using Postgres at school: 12
MySQL guy/I just want to argue guy: 4 unique*
Asked about migrating to Postgres from:
SQLServer: 1
Oracle: 0
Cassandra! 2

Thanks to our booth volunteers: Mark Wong, Joe Conway, Magnus Hagander, Umair Shahid, Chris Travers, Jeremy Schneider, and Devrim Gündüz. (If I left your name off, I’m sorry – let me know, you know where to find me.)

* Yes, at least one came back for a second try. I thought this discussion was over and done with a few years ago.

PS for the booth team: 16kg.

Posted in Conferences, PostgreSQL | Comments Off on SCaLE PostgreSQL Booth stats

Tags: community, conferences, PostgreSQL

12 January, 2024

Cycliq Bike Camera Review: Fly12 Sport and Fly6CE Gen3

by gorthx

I purchased replacement Cycliq cameras in late October 2022 when I checked my cam footage after a particularly frightening incident of harassment by a driver, and discovered the cheaper camera I was using hadn’t captured the license*.

My main reason for having these is still driver accountability. (And I shouldn’t have to do this! But that’s another post.)

I use the Fly12 Sport as my front cam and the Fly6 Gen 3 as the rear cam. Battery life is still excellent; the Fly12 lasts at least 5 hours; I can get 4 out of the Fly6.

What’s changed from the previous models:

The mounts have supposedly been “improved” in some way, but the cameras still snap in to the old mounts just fine. Getting the cams on and off the new mounts the first few times was VERY difficult – I had to get out the vise grips on the Fly6.

The Fly6 came with a non-elastic leash; I really prefer the elastic ones from the prior version.

Both cams now have nifty silicon cases for extra water protection.

The case for the Fly6 comes with two lens protectors, one each for the camera and light. They’re a lot easier to put in when the case is off the camera, but it’s fussy to put the case on without making the protectors pop back out. You also have to take the case off for charging, and connecting to a PC to view your footage. (I just found the smaller of the lens protectors a couple of weeks ago, after about a year of it flying off and getting lost.)

Bluetooth pairing (with my mid-range Android phone) works on the first try about 75% of the time, which is a big improvement – good enough that I don’t have to mess with the desktop app at all. (I don’t know if the app is even available anymore.)

These keep time now when they’re off; I don’t have to resync them much.

They’ve been much more reliable so far. I haven’t had to do a factory reset on either camera.

The Fly 12 Sport is lighter and smaller than the original. It also comes with an info screen, which tells you the amount of battery left, the time, and the lighting mode. I like this more than I thought I would.

The Fly12 Sport has more resolution and frame rate options. HDR is available at 1080p 30fps. I experimented with these settings a bit, and settled on 1440p and 30 fps plus image stabilization. The HDR is nice, but you can’t use it with image stabilization, which is more important on our roads here.

For the Fly 6, I’m using 1080p 30fps with a 5-minute video length.

Resulting .mp4 file sizes
Fly12 Sport: 1440 30fps + 5 minutes = 1.1 GB
Fly6 Gen3: 1080p 30fps + 5 minutes = 800 MB

The manual still says “We highly recommend formatting your microSD card regularly.” What is “regular”? Are we talking daily? Once a quarter? It would be nice to have an “after [x] hours of use” guideline.

I accidentally bought the wrong case for the Fly12 (got the one that fits my dead camera) and the customer service folks were good about exchanging it, I just had to pay shipping.

HOWEVER. If you can make your website pop up “hey, you’re buying product x, do you want these accessories with it?” it should also be able to check the order: “hey, you’re purchasing accessory y which doesn’t fit product x in your cart – are you sure?” because that could save a lot of hassle. I can’t be the only person who made this mistake.

All that’s left to test is longevity and durability. I’ve had this set about 14 months and really hope they last past the 18 month mark, where the previous set started to die.

It turned out that the car itself was distinctive enough that the perp was caught and charged, but that took several months.

Posted in Hardware, Portland | Comments Off on Cycliq Bike Camera Review: Fly12 Sport and Fly6CE Gen3

Tags: and now for something completely different

3 March, 2023

One Thing You Wish You Knew While Learning PostgreSQL

by gorthx

This month’s PGSQL Phriday #006 question has a very easy answer for me:

\e [filename]

\e (without a filename) lets you edit the most-recently run query and is very helpful for fixing typos, adding/removing columns, etc.

If you like to use \i to run queries from an external file (and who doesn’t!), use \e [filename].

I learned about \e from David Wheeler in the early days of PGXPUG – up to that point I’d employed the time-honored tradition of copying & pasting my queries out of a text editor, so this stands out as a life-changing moment.

Posted in Bacon, PostgreSQL, tidbits | 2 Comments »

Tags: PGSQLPhriday, PostgreSQL

10 February, 2023

Quick Logical Replication Checklist

by gorthx

By request…

Before starting the publication/subscription, check your schemas.
(Adjust if you’re not replicating all tables, or have dropped indexes/constraints to make the initial sync go faster)
– Do you have the correct number of tables on both sides?
– Do you have the correct number of indexes on both sides?
– Do you have the same number of constraints on both sides?
– Are all of the indexes and constraints valid?

After starting replication, check the logs on the publisher and the subscriber for errors.
– Investigate anything unusual.
– If you’re getting duplicate key errors, check the old database logs first – this may be normal!
– If it’s not normal, this can indicate something is writing to the subscriber. Find it and make it stop¹, then fix the situation².
– If you are getting duplicate key errors after reversing the direction of repication: did you remember to copy the sequences over?

Then connect to the databases and check them out from psql.

On the publisher:

/* "describe publication" - is this information correct? */ \dRp+
/* pg_replication_slots.slot_name should be the name of the subscription (as it appears on the subscriber) pg_replication_slots.type should be 'logical' pg_replication_slots.active should be 't' */ SELECT * FROM pg_replication_slots;

/* pg_stat_replication.application_name should be the name of the subscription (as it appears on the subscriber) pg_stat_replication.state should be 'streaming' */ SELECT * FROM pg_stat_replication;

/* This query should return nothing unexpected: */ SELECT schemaname, relname FROM pg_stat_user_tables WHERE relname NOT IN (SELECT tablename FROM pg_publication_tables);

On the subscriber:

/* "describe subscription" - is this information correct? */ \dRs+ /* When replication is caught up, pg_subscription_rel.srsubstate will be 'r' for all tables */ SELECT srrelid::regclass, srsubstate, srsublsn FROM pg_subscription_rel order by srsubstate; /* srsubstate key: i = initialize d = data is being copied f = finished table copy s = synchronized r = ready (normal replication) */

Spot-check data to verify publisher and subscriber match.

If you switch replication direction, run the above checks again.

—
1 – You may want to enable `default_transaction_read_only` mode on the subscriber. Yes, logical replication will still work. If you’re on AWS, make the change to your parameter group, as you can’t use `ALTER SYSTEM`.
2 – Fix your data by either starting over from the initial copy step, or refilling just the affected tables (briefly: drop table from publication, refresh subscription, truncate table on the subscriber, add table back to publication, refresh subscription³). Depending on the extent of the damage, it can be better to just start over.
3 – It’s not a bad idea to rehearse this particular scenario in staging.

Posted in PostgreSQL, Uncategorized | Comments Off on Quick Logical Replication Checklist

Tags: logical replication, PostgreSQL, replication

6 January, 2023

PGSQL Phriday #004: Sequence Survey

by gorthx

Hello! This month’s PGSQL Phriday prompts included a question about scripts. I have a vault of SQL queries, if that counts as scripts…they are all queries I’ve worked up when I had to find a specific piece of information, and the examples on StackOverflow weren’t quite right. I love to share them, and see what other folks come up with – here’s one I’ve been using a lot over the past year.

For background, maybe you’ve seen this error message before:

gabrielle=# INSERT INTO my_stuff (stuff) values ('hello');
ERROR:  nextval: reached maximum value of sequence "my_stuff_id_seq" (2147483647)

Yikes! You need a bigint sequence, now.

There are ways to fix this when it happens; Robert Treat’s mildy-named “Advanced Int->Bigint Conversions” from the PgConfNYC 2021 (video here) has some options.

I personally don’t like that much excitement in my day job. I prefer to catch these situations long before they happen, so I can correct them at my leisure.

Here’s a quick and easy check for “integer sequence that is roughly halfway to overflow”, courtesy Xof:

SELECT * FROM pg_sequences WHERE data_type = 'integer'::regtype AND last_value > (1<<30);

Halfway is usually enough time to implement a fix.

But it misses a specific situation that I’ve seen a fair bit of lately: bigint sequences on integer columns. Robert mentioned in his talk that Django had a bug that caused this; which gave me the idea for this next check. Over the past year, I’ve seen this with at least one other ORM and a manually-managed schema.

For reference, this is the error you get when you’re using a bigint sequence and overflow your integer column:

gabrielle=# INSERT INTO my_stuff (stuff) values ('and goodbye');
ERROR: integer out of range

(If you landed here by searching for this error, note that a bigint sequence isn’t the only cause of this error – it could just be your application trying to insert a stupidly large value into the column. That’s usually simpler to fix.)

Here’s a query that collects both the column datatype and the sequence datatype for all of your sequences, plus the last value used:

SELECT ts.nspname                  AS table_schema
       , tbl.relname               AS table_name
       , c.attname                 AS column_name
       , pg_type.typname::regtype  AS column_datatype
       , sequences.schemaname      AS sequence_schema
       , s.relname                 AS sequence_name
       , sequences.data_type       AS sequence_datatype
       , sequences.last_value
  FROM pg_class s
  JOIN pg_namespace ns  ON ns.oid = s.relnamespace
  JOIN pg_depend d      ON d.refobjid   = s.oid      AND d.refclassid='pg_class'::regclass
  JOIN pg_attrdef ad    ON ad.oid       = d.objid    AND d.classid = 'pg_attrdef'::regclass
  JOIN pg_attribute c   ON c.attrelid = ad.adrelid AND c.attnum = ad.adnum
  JOIN pg_type          ON pg_type.oid = c.atttypid
  JOIN pg_class tbl     ON tbl.oid = ad.adrelid
  JOIN pg_namespace ts  ON ts.oid = tbl.relnamespace
  JOIN pg_sequences sequences ON sequences.sequencename = s.relname
 WHERE s.relkind = 'S'
  AND d.deptype IN ('a', 'n');

I run both of these about once a week and track the rate of change on all integer sequences, and on integer columns that have bigint sequences. Is it perfect? No, but it reduces the likelihood of an outage from this problem. There’s good info in Robert’s talk about other places you can get tripped up, and the fixes would be excellent fire drill type exercises if your team is into that sort of thing. (If you’re getting the feeling you should just go watch that talk… you’re right.)

Have a great weekend!

Addendum: Before you spring into action, check how fast the sequence is incrementing by collecting a few data points, then determine how much time you have:

(2,100,000,000 – [current_value]) / [rate of increase] = time until doom

For example, if the current sequence value is 1,300,000,000 and your sequence is incrementing by 7,000/day:

(2,100,000,000 – 1,300,000,000) / (7,000/day) = about 300 years.

So this particular example isn’t worth the trouble of changing the sequence or column data type, as long as this rate stays the same. It’s not a bad idea to verify that periodically.

Posted in database administration, PostgreSQL | 5 Comments »

Tags: doom, PGSQLPhriday, PostgreSQL, sequences

23 December, 2022

Quick connection string/connectivity test using DBLINK

by gorthx

Ever had that sinking feeling when you’re setting up logical replication and CREATE SUBSCRIPTION... just kinda sat there? Maybe it eventually timed out?

I used to use foreign data wrappers to troubleshoot database-to-database connectivity problems, but using DBLINK is much easier. (Bonus: can be used to troubleshoot connectivity problems with foreign data wrappers, also!) Hat tip to Xof for sharing it.

-- Create the DBLINK extension in both databases
CREATE EXTENSION dblink;

-- From either side:
SELECT * FROM dblink (
'',
'SELECT 1'
) AS test(i int);

Example successful output:

gabrielle=# SELECT * FROM dblink (
    'port=5433 dbname=gabrielle',
    'SELECT 1'
  ) AS test(i int);

 i 
---
 1
(1 row)

Posted in PostgreSQL, tidbits, Uncategorized | 4 Comments »

Tags: dblink, PostgreSQL, voila

11 December, 2022

Bike Camera Review: Cycliq Fly12CE and Fly6CE Gen2

by gorthx

This is a review of the Cycliq Fly12CE and Fly6CE Gen 2 bike cameras, which I purchased in March 2020. (Note that this version of the Fly6 is no longer available on their website.)

tl;dr: These are decent cameras with long battery life, easy on/off from the mounts, but the bottom line is they are not up for the rigors of year-round biking in the US Pacific Northwest. The Fly6 (rear camera) didn’t quite last 18 months. The Fly12 (front camera) lasted 21. These cameras have a one-year warranty, which IMO is pretty short for something this expensive. I may be having a “Get Off My Lawn” moment.

The details:

My main reason for purchasing bike cameras is driver accountability. My top three requirements:

Image stability and detail; gotta be able to read license plates!
Long battery life; my regular commute is about 30-45 minutes each way, and I’m out for 4-6 hours for recreational rides.
Easy on/off the bike when I have to park it somewhere.
Low maintenance.

What was good:
These cams are purpose-built for biking and it shows. They have integrated lights with multiple flash/still modes, a very simple click-in mount system, and stellar battery life. The Fly12 picked up my normal Outside Voice very well, and caught both sides of some important conversations. (The Fly6 only picked up road noise and my rear derailleur shifting, as expected.) Videos are stored as .mp4s and automatically loop and rewrite over old data. The support people aren’t jerks.

What needed work:
Just about everything else.

The video quality was ok; I could read some license plates most of the time. Image stabilization was decent on pavement and minor gravel; it was impossible to get a usable video on aggregate surfaces (e.g. our “Extreme Camera Testbed” of southbound 29th between SE Salmon and Division.) I would not recommend this for recording your mtb rides.

Both cameras are USB chargeable. The Fly6 (rear) has one of those little rubber flaps protecting the port; it broke off after the third ride. This is simply not acceptable. I have a PDW taillight with a nearly identical setup that’s still attached to the light 5 years later. (May be time for a collab?) The Fly12’s port is a bit more protected from the elements by a door.

Interacting with the cams was challenging. Neither would reliably pair (Bluetooth) with any of my (mid-range Android) phones; I was successful maybe 30% of the time. This meant I couldn’t consistently control the cams via the Cycliq Android app.

The desktop app is only available for Windows and Mac. I tried both, but still had frequent problems with the app recognizing the cams.

Viewing the videos always worked when just mounting the cam as an external drive on Windows, Mac, or Linux.

The timestamps on the videos were never right – the cam doesn’t keep time when it’s turned off. Apparently that is a regular feature/bug of action cameras. You have to sync or reset before each use and I do not have time for that, especially with the Bluetooth flakiness.

The Fly12 was much heavier than I’d want to wear on my head. For me, this camera is handlebar-mount only.

The Fly6 (the rear cam) has minimal installation options; I have a Thudbuster and thus couldn’t install it on my seatpost. Plus I have a trunk bag, and that would make for a boring video. So I rigged it up to go on one of the stays on my rack (a Salsa Wanderlust from 2014, which has beefy stays – this may not work on other racks).

From the beginning, recording on the Fly6 was problematic. I’d get back from a ride and discover that it had just stopped recording partway through, even though it had plenty of battery life. I had to factory reset it about every 2-3 weeks at first, then discovered that it would also randomly start working without any intervention, so I’d just cross my fingers and hope it was recording. It gave up the ghost about 6 months out of warranty.

About 9 months out of warranty, the Fly12 started randomly missing video segments, not naming them in order, etc. A month later, it would only take a series of stills every few minutes, with no sound. This rendered the camera unusable for my purposes, as it only takes a few seconds for, hypothetically speaking, some asscricket in an Audi to blast past me on the left, driving into oncoming traffic while I’m clearly signalling a left turn, and it’s unlikely the cam would catch that in its random captures. Multiple attempts to fix this on my own (factory reset, reformatting the SD card, trying a new SD card, etc) bore no fruit, so I contacted support.

Support was nice enough; they pointed out first thing that it was out of warranty but they’d try to help anyway, and in the end offered me a 20% off coupon if I wanted to purchase a replacement camera, but ultimately were unable to fix the issue.

I did end up purchasing Cycliq replacements, despite my not-great experience with these, mainly because I wasn’t able to find anything with equivalent battery life. I’ve been using the Fly12 Sport and the Fly6CE Gen3 for about a month and a half now; they’re better so far but I’m reserving judgment for a few more months.

Posted in Hardware, Portland | Comments Off on Bike Camera Review: Cycliq Fly12CE and Fly6CE Gen2

Tags: and now for something completely different

13 August, 2022

How to fix ‘data type [x] has no default operator class for access method [y]’ error messages

by gorthx

I recently needed to create a GiST index on an integer array column for performance comparison purposes.

I expected it to be easy:

gabrielle=# \d my_table
                             Table "public.my_table"
 Column |   Type    | Collation | Nullable |               Default
--------+-----------+-----------+----------+--------------------------------------
 id     | bigint    |           | not null | nextval('my_table_id_seq'::regclass)
 my_ids | integer[] |           |          |
Indexes:
    "my_table_pkey" PRIMARY KEY, btree (id)

gabrielle=# CREATE INDEX CONCURRENTLY idx_try_gist ON my_table USING gist (my_ids);
ERROR:  data type integer[] has no default operator class for access method "gist"
HINT:  You must specify an operator class for the index or define a default operator class for the data type.

.

—

If you’re here, you’re probably wondering “What in Sam Hill is an ‘operator class'”?

An operator manipulates data in some way. You’re likely already familiar with arithmetic operators, e.g. ‘+’, ‘-‘, and > and <. If you’ve done any programming, you know about logical operators AND and OR. Many other operators are available in Postgres, such as the <@ and @> array comparison operators (aka the “ice cream cones”).

Comparison operators are grouped into classes containing the operators needed to build an index on a column of a specific datatype. In my case, I need to find the appropriate operator class for building a gist index on an array of integers. (More at the docs here: https://www.postgresql.org/docs/current/indexes-opclass.html)

—

Finding the correct operator took just a little bit of time; here’s how I did it:

First, go to the Postgres docs and find the handy query that shows which operator classes are already available in your database: https://www.postgresql.org/docs/current/indexes-opclass.html

(I limited my example to show only gist operators because that’s my particular problem here):

SELECT am.amname AS index_method,
       opc.opcname AS opclass_name,
       opc.opcintype::regtype AS indexed_type,
       opc.opcdefault AS is_default
    FROM pg_am am, pg_opclass opc
    WHERE opc.opcmethod = am.oid
    AND am.amname = 'gist'
    ORDER BY index_method, opclass_name;

 index_method |     opclass_name     |        indexed_type         | is_default
--------------+----------------------+-----------------------------+------------
 ...
 gist         | gist_int2_ops        | smallint                    | t
 gist         | gist_int4_ops        | integer                     | t
 gist         | gist_int8_ops        | bigint                      | t
 ...

gist_int4_ops looks promising, but alas:

gabrielle=# CREATE INDEX CONCURRENTLY idx_try_gist ON my_table USING gist (my_ids gist_int4_ops);
ERROR:  operator class "gist_int4_ops" does not accept data type integer[]

So I don’t have the correct operator installed. Since I’m on Postgres, “there’s an extension for that”, so let’s go hunting!

gabrielle=# SELECT * FROM pg_available_extensions WHERE comment ILIKE '%gist%';
    name    | default_version | installed_version |                    comment
------------+-----------------+-------------------+-----------------------------------------------
 btree_gist | 1.5             | 1.5               | support for indexing common datatypes in GiST

I already know btree_gist doesn’t have what I want, so let’s try another tack:

gabrielle=# SELECT * FROM pg_available_extensions WHERE comment ILIKE '%int%';
         name         | default_version | installed_version |                                                       comment
----------------------+-----------------+-------------------+---------------------------------------------------------------------------------------------------------------------
 refint               | 1.0             | [null]            | functions for implementing referential integrity (obsolete)
 address_standardizer | 3.0.5           | [null]            | Used to parse an address into constituent elements. Generally used to support geocoding address normalization step.
 seg                  | 1.3             | [null]            | data type for representing line segments or floating-point intervals
 intagg               | 1.1             | [null]            | integer aggregator and enumerator (obsolete)
 isn                  | 1.2             | [null]            | data types for international product numbering standards
 lo                   | 1.1             | [null]            | Large Object maintenance
 intarray             | 1.2             | [null]            | functions, operators, and index support for 1-D arrays of integers
 dict_int             | 1.0             | [null]            | text search dictionary template for integers
 amcheck              | 1.2             | 1.2               | functions for verifying relation integrity

Aha, I bet intarray has what I need! Create that extension:

gabrielle=# CREATE EXTENSION intarray;
CREATE EXTENSION

… and see which operators I have now:

SELECT am.amname AS index_method,
       opc.opcname AS opclass_name,
       opc.opcintype::regtype AS indexed_type,
       opc.opcdefault AS is_default
    FROM pg_am am, pg_opclass opc
    WHERE opc.opcmethod = am.oid
    AND am.amname = 'gist'
    ORDER BY index_method, opclass_name;

 index_method |     opclass_name     |        indexed_type         | is_default
--------------+----------------------+-----------------------------+------------
...
 gist         | gist__int_ops        | integer[]                   | t
 gist         | gist__intbig_ops     | integer[]                   | f
...

gist__int_ops must be it!

gabrielle=# CREATE INDEX CONCURRENTLY idx_try_gist ON my_table USING gist (my_ids gist__int_ops);
CREATE INDEX

Voila!*

—
* In practice, I got this error message:

gabrielle=# CREATE INDEX CONCURRENTLY idx_try_gist ON mytable USING gist (user_ids gist__int_ops);
ERROR:  data is too sparse, recreate index using gist__intbig_ops opclass instead
Time: 2121.985 ms (00:02.122)

“int” vs “intbig” here refers to the size and cardinality of the dataset, not the integer vs bigint datatypes.

This was easily fixed by following the instructions in the error message:

gabrielle=# DROP INDEX CONCURRENTLY idx_try_gist;
gabrielle=# CREATE INDEX CONCURRENTLY tmp_index_delayed_jobs_on_group_keys_gist ON delayed_jobs_tmp_copy_0618 USING gist (group_keys gist__intbig_ops);

Posted in PostgreSQL | 1 Comment »

Tags: PostgreSQL

the gabriellephant

Recent Posts

Archives

Categories

Meta

PGSQL Phriday #016: Query Tuning

LinuxFest Northwest PostgreSQL Booth Report

SCaLE PostgreSQL Booth stats

Cycliq Bike Camera Review: Fly12 Sport and Fly6CE Gen3

One Thing You Wish You Knew While Learning PostgreSQL

Quick Logical Replication Checklist

PGSQL Phriday #004: Sequence Survey

Quick connection string/connectivity test using DBLINK

Bike Camera Review: Cycliq Fly12CE and Fly6CE Gen2

How to fix ‘data type [x] has no default operator class for access method [y]’ error messages

Tags

Calendar

Archives