The Spam Club

» The Spam Club - Life, The Universe and Everything - Site Issues - How to verify own images against goodolddays DB?
ReplyNew TopicNew Poll
» Multiple Pages: 12

How to verify own images against goodolddays DB?

Posted at 22:10 on July 4th, 2017 | Quote | Edit | Delete
Member
Master Gumby
Posts: 121
Since I'm collected many images from different sources over the years, I'm wondering what's the best method to verify/check them against this DB.

I know from some other retro gaming projects, they are using and providing .dat files to check the integrity of the files.

Do you provide a similar technique? E.g. Is there a table with all SHA1 hashes of the images that can be used to scan automatically the own library?

It would be a bit time consuming to check them one by one.
Posted at 04:56 on July 5th, 2017 | Quote | Edit | Delete
Avatar
Admin
Reborn Gumby
Posts: 11154
If you tell us what a ".dat file" would consist of, it could probably be done.
-----
Now you see the violence inherent in the system!
Posted at 21:33 on September 7th, 2017 | Quote | Edit | Delete | Delete Attachment
Member
Master Gumby
Posts: 121
Today, I created a ".dat file" and I thought it's worth to share it here.

The file can be used in Clrmamepro to scan a set of disk images against The Good Old Days database. By using this approach, you are able to check, if some of your images match the images present on the site.

Originally posted by Mr Creosote at 04:56 on July 5th, 2017:
If you tell us what a ".dat file" would consist of, it could probably be done.

Something like that. ;)
Attachment: *****
-----
Edited by fuxxxyfloppy at 21:35 on September 7th, 2017
Posted at 17:26 on September 12th, 2017 | Quote | Edit | Delete
Avatar
Admin
Reborn Gumby
Posts: 11154
There you are: /tgod_floppy_images.dat. If it works for you, I can automate the regeneration, e.g. on a weekly basis.
-----
Now you see the violence inherent in the system!
Posted at 20:53 on September 12th, 2017 | Quote | Edit | Delete
Member
Master Gumby
Posts: 121
Originally posted by Mr Creosote at 17:26 on September 12th, 2017:
There you are: /tgod_floppy_images.dat. If it works for you, I can automate the regeneration, e.g. on a weekly basis.

Thanks. An automated generated dat file will be really helpful. Also including the description, status and year is a great idea.

However, I'm not sure how I can use your file (at least it doesn't work with Clrmamepro). But I will try it again tomorrow. Maybe I need to convert it in a different dat format or I need use a different software. Do you have any ideas?

Edit:
I detected a few minor things:
- Your file does not include all image sets (eg. Battle Isle 2 (1), Day of the Tentacle (4)).
- Your file shows 'status="good"' when "Modified" (eg. Zeliard (16)) or "Unverified" (Battle Bugs (2)) is mentioned on the homepage. I expect it should be similar or was it done on purpose?
-----
Edited by fuxxxyfloppy at 21:06 on September 12th, 2017
Posted at 06:26 on September 13th, 2017 | Quote | Edit | Delete
Avatar
Admin
Reborn Gumby
Posts: 11154
I followed the DTD linked inside your example filw. It is possible, of course, that some tools expect specific subsets or specific additional tags. If they provide their own definition, tell me where to find it.

I omitted sets which aren't actually available on the server. This could be on purpose, or simply an issue with naming convention. Would need to check each individually.

The status attribute does not allow for many different values in that DTD. I mapped bad and verified and kept all other states to default to good, meaning assumed to be working, even if not perfect.
-----
Now you see the violence inherent in the system!
Posted at 20:43 on September 13th, 2017 | Quote | Edit | Delete
Member
Master Gumby
Posts: 121
Thanks for your reply. (I'm new to DAT files... ;) Therefore, maybe I overlooked something.)

Originally posted by Mr Creosote at 06:26 on September 13th, 2017:
I followed the DTD linked inside your example filw. It is possible, of course, that some tools expect specific subsets or specific additional tags. If they provide their own definition, tell me where to find it.


I found this link. It gives a good overview about different DAT formats: https://github.com/SabreTools/wizzardRedux/wiki/DAT-File-Formats

I used the software ClrMamePro. Therefore, the corresponding format should work.

However, the tool https://github.com/SabreTools/SabreTools should be able to convert between different formats (also from your created Logiqx XML). But I didn't got it to work (maybe because I used Linux and Wine?).

Originally posted by Mr Creosote at 06:26 on September 13th, 2017:
I omitted sets which aren't actually available on the server. This could be on purpose, or simply an issue with naming convention. Would need to check each individually.


Not, available on the server? But for example Day of the Tentacle (4) http://www.goodolddays.net/diskimages/id%2C4/ has a download link. Or do you mean something else?

Originally posted by Mr Creosote at 06:26 on September 13th, 2017:
The status attribute does not allow for many different values in that DTD. I mapped bad and verified and kept all other states to default to good, meaning assumed to be working, even if not perfect.


I understand. No problem.
-----
Edited by fuxxxyfloppy at 20:45 on September 13th, 2017
Posted at 05:29 on September 14th, 2017 | Quote | Edit | Delete | Delete Attachment
Avatar
Member
Pupil Gumby
Posts: 13
I was pointed here by one of my fellow collectors who noticed that my tool was brought up for the purposes of making DAT files. To the person who was using Linux, you need to use mono and not wine because my program is written in C# which tends not to be wine-friendly.

Also, I've been personally datting (but not releasing them to anyone) the TGOD sets as I've seen them. Here's a copy that includes deep hashes (since, again, nobody but me was using them). Hope this helps, and I'll be glad to assist if there's anything with datting :)
Attachment: *****
Posted at 06:26 on September 14th, 2017 | Quote | Edit | Delete
Avatar
Admin
Reborn Gumby
Posts: 11154
Thank you both, this is very helpful.

I'm puzzled why both of your examples extensively use a "machine" tag. Where is this specified? The DTD has "game" instead.
-----
Now you see the violence inherent in the system!
Posted at 07:58 on September 14th, 2017 | Quote | Edit | Delete
Avatar
Member
Pupil Gumby
Posts: 13
<game> is the formal specification set by the original Logiqx format, while MAME uses <machine> because it's broader than just a game. They're pretty interchangable though I tend to like using <machine>. It's purely preference as of right now.
Posted at 07:58 on September 14th, 2017 | Quote | Edit | Delete
Member
Master Gumby
Posts: 121
Originally posted by darksabre76 at 05:29 on September 14th, 2017:
To the person who was using Linux, you need to use mono and not wine because my program is written in C# which tends not to be wine-friendly.

Thanks for the comment. Now it works (however, I thought I tried with mono before).

@Mr Creosote:
After a lot of try-and-error, I managed to get your file working. Here are the important changes that are needed:

- change "<disk name=" in "<rom name="
- add crc checksums (it seems that CRC is required (and other checksums are optional?))

Everything else can stay as it is.

A minor change that does not change the final result, but may influence the metadata used in some tools:

- try to avoid name identical game names like <game name="Loom"> (e.g. maybe use the archive name or add the id to the game name)

At least in ClrMamePro, it seems that same game names will be merged and the individual description will be overwritten and is lost. I'm not sure how other tools behave here.

@darksabre76: Do you know, if my observations above are correct?
-----
Edited by fuxxxyfloppy at 07:59 on September 14th, 2017
Posted at 09:22 on September 14th, 2017 | Quote | Edit | Delete
Avatar
Admin
Reborn Gumby
Posts: 11154
I can easily make these structural changes in the generator code, even if it seems quite strange to have a DTD but not to follow it. In addition, clearly these are neither "machines", nor "ROMs".

The addition of CRC is problematic, however. We don't have this inside our database and frankly, it is unnecessary, so I don't really want to require it in the future. The only option would be opening each archive and generating the values from the files directly, but doing that each time the index is regenerated for all games is total overkill. Especially if the only purpose is to support a buggy application.
-----
Now you see the violence inherent in the system!
Posted at 11:36 on September 14th, 2017 | Quote | Edit | Delete
Avatar
Admin
Reborn Gumby
Posts: 11154
I mat have found a workaround for the CRCs, but could you confirm that the ominous CRC is actually a CRC32, and not some other type?
-----
Now you see the violence inherent in the system!
-----
Edited by Mr Creosote at 12:03 on September 14th, 2017
Posted at 19:31 on September 14th, 2017 | Quote | Edit | Delete
Member
Master Gumby
Posts: 121
Originally posted by Mr Creosote at 09:22 on September 14th, 2017:
In addition, clearly these are neither "machines", nor "ROMs".

I think it's because the DAT format was mainly developed for managing roms and later adapted for other media (e.g. CD images [redump.org] and game files [The Total Dos Collection])

Originally posted by Mr Creosote at 09:22 on September 14th, 2017:
The addition of CRC is problematic, however. We don't have this inside our database and frankly, it is unnecessary, so I don't really want to require it in the future. The only option would be opening each archive and generating the values from the files directly, but doing that each time the index is regenerated for all games is total overkill. Especially if the only purpose is to support a buggy application.

I don't know why the software requires CRCs and additional checksums are optional. I was also surprised. That makes additional and other checksums meaningless.

If it's too much work, you can leave it. Specially, if there are not so many users that will to use it.

My idea was to look for a way to automatically check own sets of disk images (img files) against the sets analyzed and published here on TGOD. And since many people are working with the DAT format and corresponding software for managing and checking their roms, cds and dvds and it seems it works well, I thought it might be good way to use it here, too.

Originally posted by Mr Creosote at 11:36 on September 14th, 2017:
I mat have found a workaround for the CRCs, but could you confirm that the ominous CRC is actually a CRC32, and not some other type?

Yes, I checked it and it's a CRC32.
-----
Edited by fuxxxyfloppy at 19:31 on September 14th, 2017
Posted at 21:08 on September 14th, 2017 | Quote | Edit | Delete
Avatar
Member
Pupil Gumby
Posts: 13
Originally posted by fuxxxyfloppy at 07:58 on September 14th, 2017:
@darksabre76: Do you know, if my observations above are correct?


CRCs are indeed required in all DATs currently, though usually they include CRC32, MD5, and SHA-1 hashes. My above DAT uses SHA256, SHA384, and SHA512 additionally, though they are not supported by any tool currently.

As for using "<rom>" instead of "<disk>", you are also correct. "<disk>" refers to a special type of file called a CHD that is used by MAME and related emulators to store CD, DVD, LD, and HDD information. Everything else, regardless of original medium is considered a "<rom>". There are other variants, but those are the main ones.

Similarly, identically named sets DO cause issues in most rom managers, so having a unique identifier such as "Game Name (ID)" would be best. Again, see my previously attached DAT for an example of this.

Outside of that, the reason that CRC32 is the base required hash is due to backwards compatibility. Before there were better notions of how DATs should be created, the bare minimum of a CRC32 and the size were enough to get most items properly accounted for. Now that CRC32 and MD5 have been broken, they are still valuable to have due to backwards compatibility, but having at least 2 hashes available is best, usually CRC32 and SHA-1.

You will not find a single ROM manager currently that can bypass the requirement of CRC32 because of the above. The only saving grace is that since you store your files in ZIP format, you can just grab the CRC32 values directly from the ZIP headers with little issue. This should make the initial population of your internal database quite a bit simpler.
Posted at 21:20 on September 14th, 2017 | Quote | Edit | Delete
Avatar
Member
Pupil Gumby
Posts: 13
Originally posted by Mr Creosote at 09:22 on September 14th, 2017:
I can easily make these structural changes in the generator code, even if it seems quite strange to have a DTD but not to follow it. In addition, clearly these are neither "machines", nor "ROMs".


The DTD from Logiqx has been long stagnant, so it's no surprise to me that modern convention trumps the DTD for everything.

As for the terminology of ROMs, Machines, Games, Disks, etc, these all started in the emulation community and were just adapted along the way. Here's a decent breakdown of some of the terms used in DATting that might help:

- ROM : Any file that is not a CHD image, originally referred only to arcade ROMs
- Disk : Any file that is in the CHD format. Look it up in MAME documentation for more
- Game / Machine : Generic title for a "set", that is the parent folder of a group of roms and disks. Again, terminology stems from usage in emulation primarily with Machine being newer
Posted at 06:22 on September 15th, 2017 | Quote | Edit | Delete
Avatar
Admin
Reborn Gumby
Posts: 11154
Grabbing the CRCs from the archives is indeed my plan.

Are you saying size is also "required"?

Understood about the conventions. I'm simply baffled why everybody, including the conversion tool's website, cites the DTD, but without ever reflecting the striking difference with practical implementation. It seems to be assumed "everybody" knows.
-----
Now you see the violence inherent in the system!
Posted at 06:59 on September 15th, 2017 | Quote | Edit | Delete
Member
Master Gumby
Posts: 121
Originally posted by Mr Creosote at 06:22 on September 15th, 2017:
Are you saying size is also "required"?

When I checked it yesterday, it was working working without including the file size.
Posted at 17:50 on September 15th, 2017 | Quote | Edit | Delete
Avatar
Member
Pupil Gumby
Posts: 13
Strictly speaking the size is NOT required, but it is highly recommended to include the size of the file as well due to, again, backwards compatibility and better security. This is also information you can pull if you're looking at the zipfile headers the first time, so it should not be an issue. The only time that I've seen the size not included is for <disk>, because it relies on an internal hash instead of the external one, so the size doesn't strictly matter.
Posted at 13:44 on October 2nd, 2017 | Quote | Edit | Delete
Avatar
Admin
Reborn Gumby
Posts: 11154
New file available in the same place. Is it now accepted by the tool?
-----
Now you see the violence inherent in the system!
» Multiple Pages: 12
ReplyNew TopicNew Poll
Powered by Spam Board 5.2.4 © 2007 - 2011 Spam Board Team