Techhie Stuff too much for some (was - What's Happened to Optimus Trousers?)

BTripz · 13 August 2015 11:35

Has he been turned off and why hasn’t he reported that Fonte is, apparently, in talks with Villa about a possible transfer.

CB-Saint · 13 August 2015 11:44

Brendan bought him

saintbletch · 13 August 2015 11:50

I’m sitting in a coffee shop right now and my hands are covered in his guts and bits of brain.

Ripping him open was relatively easy, however he’s proving a little Humpty-ish at the moment.

Actually Big Stupid Bob, you’ll appreciate this as a DB man. I’ve been trying to migrate from one MongoDB cloud host to another. But NoSQL DBs don’t have the same maturity level for the support tools. So I’ve had to right a dump/load routine and encountered all sorts of permission issues.

I’ve got the DB stuff done now, but, well you did ask, I use an API from bit.ly to shorten URLs and since I optimised the code to make Optimus trousers even more patronisingly verbose, even more quickly, I’m hitting a rate limit with that API.

So right now, you are interrupting me from implementing a governing routine to ensure that he makes no more than 20 calls to bit.ly within a 10 second period.

Now does anybody know where this spleen goes?

https://vine.co/v/MqxPlIJFTMw

saintbletch · 13 August 2015 12:40

In solving these problems, I’m reminded of this cartoon.

I’ve never been a quick and dirty sort of hacker,

BTripz · 13 August 2015 13:26

I’ve been reading some of the claims about relational database on the MongoDB site and they are, quite frankly, wrong!

Downtime for a whoe database just to add a new column to a table? When did they last deal with a relational database?

saintbletch · 13 August 2015 14:28

Dunno BBB, I’m guessing that their comparison is against one of the common LAMP-stack DBs such as MySQL or Postgresql?

Do your comments come from experience of Oracle/SQL Server? I wouldn’t be surprised if they managed this sort of thing really well too.

I think that it’s a general statement of the flexibility of something like Mongo that comes from its fundamentally different approach

Mongo, and in fact most document databases that I’ve dabbled with, take a little bit of head-morphing if you’ve come from a relational world - as I did/do.

Basically, a Collection contains Documents and each Document contains a series of JSON-like Field=Value pairs.

You might instantly see that as Collection = Table. Document = Row. Field=Value = Column.

To an extent that is true, and you can make a Mongo Collection behave like a relational Table, but every Document you put in a Collection can have different Fields.

There is no schema that dictates what goes into a Collection/Document and how. You can store anything you like in there.

For example, I could have a Collection that contained the following Documents:

Note that Documents can contain completely different stuff’, and also a Document can contain lists of things that relate to it.

You, me and Ted Codd, might be reaching for the normalisation stick at this point, to move those lists of things into a related table, but there is no performance reason to do that with Mongo.

What this means is that unlike an RDBMS project that requires significant up-front planning, you can get running with MongoDB very quickly and address the indexing/structure/linking details later.

Mongo really comes into its own in massive, sharded and or clustered datasets. It’s also really good at managing the life-cycle of Internet-related information… Documents can have TTL values after which they die, and Collections can be capped (my problems at the moment relate to extending a capped collection) to a specific number of documents, Older documents are simply dropped as the Collection grows. It stops the need for a great deal of admin and DBA activities.

I’m sure those things can be found in mature, commercial DBMSs too, but I really don’t think the two compete with each other.

Ohio-Saint · 13 August 2015 16:19

So, OT has been trained to stalk new signings and take video and pictures of them from the bushes around the training ground? Cool!

lifeintheslowlane · 13 August 2015 16:36

GET A ROOM YOU MUPPETS!

You’re not in my league…I’ve just set-up my wife’s new Dell Laptop and everything appears to be working.

This is about as stressful as my life gets these days.

Goatboy · 13 August 2015 18:41

What about imported aubergines?

BTripz · 13 August 2015 19:40

Yes, yes Saintbletch I understand all the pros and cons of Json and XML and noSQL databases, it’s not a problem for me, I’m not totally DB focussed having come through the development ranks. And yes I am primarily Oracle and MSSQL focussed these days althougth I still do deal with MySQL and ProstgreSQL databases and Access (but we won’t call that heap of stinking shit a database shall we)

What I don’t really like is MongoDB making these spurious claims about RDBMS’s that anyone with a shred of nouse would know is false. Unfortunately they’d be selling their products to the CEOs who haven’t got any nouse.

They might as well say “You know all those changes to the DB that your Devs have been making without any downtime well now you can do it without downtime but in an AGILE manner”. CEO’s will hear the AGILE bit and cream themselves.

I would love to play with some NoSQL stuff at some point but unfortuantely our company has different drivers and our RDBMS tech is largely defined by our suppliers and .Net focused developers, oh well…

saintbletch · 13 August 2015 21:37

MongoDB is open source and free, bobman.

Free as in beer, and free as in speech.

Nobody is trying to sell ‘it’ per se. But some organisations obviously have a vested interest in making claims about it.

I’d be interested in what steps you have to take, or what infrastructure you need to add stuff to tables in a production environment on the fly.

How does it do it?

Versioning?

What happens to queries that are in flight?

When does the new column become available?

What happens with empty columns that haven’t yet been populated?

What if the new column is part of an index / compound index?

Will apps break if they are not compiled to the new schema?

Is this available in lots of RDBMS, or is it vendor-specific?

Not taking a side here, Bob. Just looking to understand what the differences really are.

BTripz · 14 August 2015 08:56

Seriously Bletch, wow, I am suprised or are you taking the p?

I can’t speak for all database, only the ones that I am aware of and use, i.e Oracle, MSSQL, PostGres, mySQL can be different depending on what storage engine you’ve used for the database!

How does it do it?

Generally this is down to the DBA running a bunch of DDL scripts to alter the table or, heaven forfend, using a GUI/wizard?

Versionig

Not sure what you mean here? Versioning of the database in case you need to rollback the changes? There are ways and means of doing this but generally single table backups of database backups are the main way of doing that?

What happens to in flight queries

If you’re not altering any index then nothing will happend to the query, the RDBMS has cached the plan and is retreiving the data. The query won’t be referencing the column that you are adding so nothing happens.

When does the new column become available

As soon as the command finishes executing. Obviously this can be a long time if the table has been locked for whatever reason.

Whate happens with empty columns that haven’t yet been populated

If they’re defined as NULLs allowed then they will be null, ie no data, if you define them as NOT NULL then, if the table has data already, the column will not be added unless you specify a default. If you do this the column will be added and populated with the default value.

What if the new column is part of an index / compound index?

If won’t be until after the column has added! If you are adding the column to an existant index then that index will have to be rebuilt, this won’t necessarily cause downtime but may affect query performance. If it’s a new index than that new index will be built. Queries won’t “know” about the index until after it has been completely built.

Will apps break if they are not compiled to the new schema

Not necessarily. Some DML statement may fail if they don’t explicitly state the columns, i.e. insert into table a values(1,2,3,4) will fail if you have added a 5th column and it is specified as NOT NULL without a default.

Is this available in lots of RDBMS, or is it vendor-specific?

As said above it is available in most of the mature RDBMS that I work with.

One caveat to this is that the column is being added to the end of the table. If you wanted to add a column to the middle of a table then that is another kettle of fish altogether and would involve downtime BUT why would you want to do that? To make the table definition look pretty?

saintbletch · 14 August 2015 10:52

Originally posted by @BTripz

Seriously Bletch, wow, I am suprised or are you taking the p?

I can’t speak for all database, only the ones that I am aware of and use, i.e Oracle, MSSQL, PostGres, mySQL can be different depending on what storage engine you’ve used for the database!

How does it do it?

Generally this is down to the DBA running a bunch of DDL scripts to alter the table or, heaven forfend, using a GUI/wizard?

Ahh, I think this is the crux of it Bob. I was expecting you to tell me that the modern DBs have specialised features to deal with this and that it is transparent to the developer/application.

If I’m a developer of an AGILE * app, or at least something that significantly changes in functionality once/week. Then I’m likely to be into downtime with your approach.

I’ve got to spec the change. Consider the wider impact. Schedule the time for the DBA. Wait for the DBA to write/test the script. Wait for the script to execute. Wait for testing, etc.

Now, I understand that the DBA in my example might be me on a small project, and so I can avoid some of those bottlenecks. However, if I am a DBA too, it’s still downtime to me as a coder.

If I’m rolling out schema-changing modifications once or twice/year on seriously important apps, then no problem. There’s likely to be a whole bunch of testing and other coder-cosseting processes too that make the schema mods simply a part of a wider process.

Consider BIG DATA* - the dataset size that Mongo (Mongo comes from hu_mongo_us) would likely be dealing with make schema conversions like this non-trivial, and also consider it’s not just the size of the data set, but the speed at which new data is arriving.

These things make the issue of adding columns non-trivial in an RDBMS world.

An excellent project I saw with Mongo was where the developer was trying to mine user-information from combining web-server log files, IP addresses location lookup, cookie data and other related sources. The logfile data is arriving at 1000s of documents per second, and yet the app can be changed behind the scenes with no need to change the data.

I asked about versioning because I’ve seen some solutions to managing different schemas that implement versioning. These send back different datasets based upon the version of the schema the app wants to see (this is in the MDM world though).

With Mongo, and other schema-less DBs, I simply start putting different shit into the Collection.

With this benefit brings a shit-load of other complications and caveats, but I can see how Mongo can make the claim about downtime.

Coders code; the database adapts.

The two big areas of difference for me coming from an RDBMS world are a) how to design this stuff and b) transactionality.

Because you can store related content with the entity in the same collection, should you? When should you move it out into its own collection?

And how do I implement atomic changes to the data?

The answer to these questions seems to be that some projects are crying out for RDBMses and some are better on Mongo.

* Buzzword alert.

BTripz · 14 August 2015 11:20

Ah, so you’re talking about developer downtime rather than system downtime. For me the bigger concern would be system downtime which you try to avoid at any cost.

I have a serious problem with AGILE and this “rip things down and start again” mentality. Yes it may be good and it makes apps more streamlined etc. etc but does it really lend itself to big historical data stores.

I can easily see where MongoDB fits into this mentality but can it really call itself a DB (in the purest sense) when it is reallly just one big, huge collection of JSON/XML/whatever. And how does indexing work if you keep changing your collections.

One thing MongoDB does do is take away a hell load of shit from the DBA but then who does administer the MongoDB, the Devs, but that would be putting them into downtime if they had to start managing the DB engine.

Like you say there are a shit-load of complications and caveats that are brought into play but Mongo boviously has its uses and it works in the real world.

anal1 · 15 August 2015 02:13

This kind of dirty talk is why I joined. I don’t have a strong opinon about Big Data/Mogo style stuff, largely coz I’ve not worked with it and only have a tangential understanding of it, whilst being able to bullshit about RDBMS almost as well as B + BBB.

At my place, devs are king and every database is like an XML instance with hundreds of tables. They’ve chewed up 4 passionate data architects who’ve tried to get some rigor and design into the situation so people can know what the frick is going on in the company easily. But whatever, we’ll stick with 50 people producing 100 reports from 200 data extracts giving 500 different results. Whatever.

saintbletch · 15 August 2015 09:20

Originally posted by @BTripz

Ah, so you’re talking about developer downtime rather than system downtime. For me the bigger concern would be system downtime which you try to avoid at any cost.

I have a serious problem with AGILE and this “rip things down and start again” mentality. Yes it may be good and it makes apps more streamlined etc. etc but does it really lend itself to big historical data stores.

I can easily see where MongoDB fits into this mentality but can it really call itself a DB (in the purest sense) when it is reallly just one big, huge collection of JSON/XML/whatever. And how does indexing work if you keep changing your collections.

One thing MongoDB does do is take away a hell load of shit from the DBA but then who does administer the MongoDB, the Devs, but that would be putting them into downtime if they had to start managing the DB engine.

Like you say there are a shit-load of complications and caveats that are brought into play but Mongo boviously has its uses and it works in the real world.

I’m a big fan of Agile in the right place, Bob.

In one of my lives, I’m a software marketing consultant and I often work with start-up companies. The majority that I work with now are developing using Agile or iterative methods - mainly the SCRUM methodology.

Agile to me isn’t about starting again, though. It’s about taking what you’ve got, understanding the delta to where you want to be next, and getting that done quickly until you don’t have any more deltas.

Agile is about doing the bare minimum to get the functionality in place, testing that with the customer by putting usable, and therefore valuable (but unfinished) software into the customer’s hands. Taking stock, adding new bare-minimum features to a backlog, prioritising them, and cutting a new version in a short timeframe.

Customers get their hands on software very quickly and therefore can influence design early, which can save months of work when compared to the

Project team said

“Ta Da!”

Customer said

“Hold on, that wasn’t what I wanted”

…approach of waterfall methodologies.

I’ve seen it work incredibly well. And yet, I completely agree that the database you get out of it at the end of the day, having been through countless iterations, may well be complete pants. This model is not particularly suited to strict schemas and RDBMSes. This is why the NoSQL DB movement is gaining such a foothold IMO.

For that reason, RDBMSes are likely to be the DB of record for years to come. However, for iterative projects that only need a working store of information, and that will eventually pump their data up- or down-stream to an RDBMS, NoSQL is a NoBrainer.

Originally posted by @anal1

This kind of dirty talk is why I joined. I don’t have a strong opinon about Big Data/Mogo style stuff, largely coz I’ve not worked with it and only have a tangential understanding of it, whilst being able to bullshit about RDBMS almost as well as B + BBB.

At my place, devs are king and every database is like an XML instance with hundreds of tables. They’ve chewed up 4 passionate data architects who’ve tried to get some rigor and design into the situation so people can know what the frick is going on in the company easily. But whatever, we’ll stick with 50 people producing 100 reports from 200 data extracts giving 500 different results. Whatever.

As you say anal1, bad design is bad design.

Out of interest, do your devs work to some sort of iterative methodology?