Unoffical empeg BBS

Quick Links: Empeg FAQ | RioCar.Org | Hijack | BigDisk Builder | jEmplode | emphatic
Repairs: Repairs

Topic Options
#133757 - 08/01/2003 17:07 more jemplode ideas
image
old hand

Registered: 28/04/2002
Posts: 770
Loc: Los Angeles, CA
since no one commented on my previous idea of a local association file between crcs and filenames, i'm just going to assume that was a bad idea (or none of you take me serious because my handle consists of uppercase consonants [in which case i'd like to get it changed <-- directed to the red names]). but i still want the slow load up time of my mp3 tree (per this method) to be addressed and hopefully fixed.

so this is my next proposal. when we first discussed deduping, i proposed a crc check that took x bytes per interval. that was shot down because of the chance of collisions. i say that we additionally implement a crc check that includes the front 16k, middle 16k, and back 16k bytes. would be called shorthash in the tag. the sole purpose is for syncing my treelist quickly and getting new songs in my empeg. if a file's shorthash == one in the database, its a dupe. if not, generate the real crc check and add. as an option, you can override and do full crc checks during the sync if your paranoid about collisions. if you do an override, and it finds two files with the same shorthash, but not the same hash, jemplode notifies the user.

i also would like to see a hash<>fid association file implemented. this will be for the people that want to keep their plays counters intact while upgrading/reformatting their hard drives. just make the association file, backup the dynamic data partition using DD, do what you need to do w/ the drive, use jemplode w/ a hash<>fid function to resync, and restore dynamic data using DD again. wallah... your plays counters are back.

i've also gotten creative with deletion. if i'm listening on my workstation and i dont like a track, i would usually delete the file off my hdd and then do the same on the empeg. everyone knows that this is sometimes too hard to keep track (hence the call for real pc<>empeg syncs). what i've done is made a RECYCLE BIN folder in the root of my mp3tree. whenever i want to delete a file, i move it to the folder. then when i sync using jemplode, it makes a ref of the song on the playlist of the same name. then i delete it both locally and "remove completely" from the empeg. my idea is to implement something like this: on sync, all tracks a designated folder that is named in the config will be removed completely. manual deletion wont be needed anymore and we're one step closer to automatic pc<>empeg syncs.

well, those are my ideas. probably no one will reply again because they're way out there and the only person who would have any use of the features are myself.... but alas, i wish i can program in java but i can't (believe me i tried ... looking at the source made my head hurt =).

comments?

Top
#133758 - 08/01/2003 22:02 Re: more jemplode ideas [Re: image]
suomi35
enthusiast

Registered: 16/02/2002
Posts: 290
Loc: Denver, CO
I wanted to experiment with some Jemplode features too, but same as with you, looking at the source made me $hit my pants. Someday I might take the time to figure out java...seems like it would require a lot of it!

Don't get discouraged at the lack of reply to your posts, I have noticed an overall slump since the holidays in BBS activity. It will pick back up.
_________________________
-Jason

Top
#133759 - 08/01/2003 22:03 Re: more jemplode ideas [Re: suomi35]
Daria
carpal tunnel

Registered: 24/01/2002
Posts: 3937
Loc: Providence, RI
Everybody's too busy with their new toys.

Top
#133760 - 08/01/2003 22:32 Re: more jemplode ideas [Re: image]
mschrag
pooh-bah

Registered: 09/09/2000
Posts: 2303
Loc: Richmond, VA
My silence recently isn't me ignoring you -- I'm in the middle of another project for the next couple of weeks... I think you guys will be interested when it's done. But don't fret -- I keep all these messages in my "jEmplode Bugs/RFE's" folder and I do actually review all the ideas that come up.

I definitely agree that the full hash algorithm is too much and we need a much faster first-line-of-defense ... I just need to play around some with these ideas and a couple others I've been tossing around to figure out what makes the most sense.

ms

Top
#133761 - 09/01/2003 12:18 Re: more jemplode ideas [Re: mschrag]
image
old hand

Registered: 28/04/2002
Posts: 770
Loc: Los Angeles, CA
k, now that i know that you're listening, i have a couple more bug reports. =)

first, you cannot unmark a track on jemplode. it seems like its working until you sync, and it goes right back to being marked. i havent tried it the other way around.

second, you did a good job locking the parent window when in a middle of a sync, but you can still drag folders/files (i got impatient and wanted to try it. its the Q&A in me from my first few jobs). it messes up the playlists afterwards, but you dont really see it in jemplode, but when you open up emplode, then you see that there are all these errors.

had to delete all my playlists (which wasn't a problem afterwards. the hashing algo just made the refs necessary again). which brings me to another pc<>empeg sync idea. have an action where it will check all hashes on the pc. if there are hashes on the empeg that dont match anything on the pc, assume that the pc copy was deleted or changed and delete from the empeg. much better than my trashbin idea, imo.

oh, and lastly, the auto-sort on sync doesnt work if you do anything else than title in the configuration. i've tried using Artist instead, but it has no effect. maybe i'm not doing it right.

i'll be waiting for your other project. =) thanks

Top
#133762 - 09/01/2003 12:23 Re: more jemplode ideas [Re: image]
mschrag
pooh-bah

Registered: 09/09/2000
Posts: 2303
Loc: Richmond, VA
1) I think this is actually a player bug, but I haven't verified this in a while

2) oops

3) this doesn't work in the general case because you can sync onto your empeg from multiple places, so there's basically no way to tell if it was added from another machine or deleted from your primary. this is one of the main reasons this is such a nasty problem ...

4) I don't remember what this features does I don't really use it... I'll check.

Top
#133763 - 09/01/2003 12:33 Re: more jemplode ideas [Re: mschrag]
image
old hand

Registered: 28/04/2002
Posts: 770
Loc: Los Angeles, CA
1) emplode in b13 is able to do it. probably a change in protocol?

3) then have it work like a palm hotsync. let the user decide to go one way or the other, or go both ways so the end result looks equal.

Top
#133764 - 09/01/2003 19:03 Re: more jemplode ideas [Re: image]
Daria
carpal tunnel

Registered: 24/01/2002
Posts: 3937
Loc: Providence, RI
first, you cannot unmark a track on jemplode. it seems like its working until you sync, and it goes right back to being marked. i havent tried it the other way around.

I never noticed this, but I can't think what I unmarked in 45. I was waiting for bulk unmark. I should try it and see.

Top
#133765 - 10/01/2003 12:27 Re: more jemplode ideas [Re: image]
sirmanson
journeyman

Registered: 06/03/2002
Posts: 70
Loc: Tucson, AZ USA
I think all of these ideas are great and would definately use them. The suggestion for when there is a song on the empeg not on the local drive (having jemplode remove it from the empeg) .... I think it would also be useful to allow the user to select whether they want the empeg file deleted or downloaded so that the local drive matches.
_________________________
----- RioCar 60gb

Top
#133766 - 16/01/2003 08:42 Re: more jemplode ideas [Re: mschrag]
image
old hand

Registered: 28/04/2002
Posts: 770
Loc: Los Angeles, CA
the more i think about it, the more i feel that a hotsync type method would be the best for syncing up your hdd and empeg. hashing opened up a lot of possibilities. the user should hvae the following choices in the interface:
1) pc -> empeg: all different hashes on pc transferred over to empeg (new songs), all different on empeg deleted (deleted songs).
2) empeg -> pc: vice versa from above.
3) pc <-> empeg: both of above, but without the deletions.

also, do the same with the tags. i commonly change the tags on the hdd, and wish to reflect it on the empeg. for people who edit tags in jemplode, you can reverse the process to sync up the tags on the pc. obviously, a pc <-> empeg sync for tags isn't feasable. if you edited some tags on the empeg, and some on your pc.... there should be a way to do this individually also (or by playlist, etc) so you don't lose your edits.

now for the above, i wouldn't mind to go thru the whole hashing process, just as long as it autosyncs after (so i can leave it unattended).

Top
#133767 - 16/01/2003 21:09 Re: more jemplode ideas [Re: image]
mschrag
pooh-bah

Registered: 09/09/2000
Posts: 2303
Loc: Richmond, VA
so i realized one of the problems here ... the slowness is in calculating that hash value. But ultimately, to be able to find a duplicate later, you HAVE to calculate the hash value when you import a new tune. That means you have one of two cases:

1) the quick lookup doesn't find any hits, so this tune is new, but then you have to calc it's hash so you can look for its duplicates next time
2) the quick lookup matches, then to know for sure if it's a match or not, you have to calc the hash

so it seems that you have to end up computing that hash for every tune no matter what. am I being stupid here and missing something obvious?

Top
#133768 - 16/01/2003 22:27 Re: more jemplode ideas [Re: mschrag]
genixia
Carpal Tunnel

Registered: 08/02/2002
Posts: 3411
Hmm.... you're only talking about a one-time per import calculation, correct?

And the concern is the time taken calculating the hash, presumably when doing large imports?

I wonder whether the hashing can be done at sync-time, and pipelined into the process, reducing the total hash cost to the number of hashes that need to be completed until we find a track that needs to be transferred. (At which point, we are transfer bandwidth limited, hashing can continue in the background)

For record though, I've just md5sum'd 47 mp3s in 30 seconds, on a Celeron 500, so I don't think it'd be that big an issue. ( gnu md5sum binary on linux )
_________________________
Mk2a 60GB Blue. Serial 030102962 sig.mp3: File Format not Valid.

Top
#133769 - 17/01/2003 01:26 Re: more jemplode ideas [Re: mschrag]
image
old hand

Registered: 28/04/2002
Posts: 770
Loc: Los Angeles, CA
to reiterate, quicklookup is a hash calculated from bits of a track.

case 1) is correct. case 2) is not. in my mind, jemplode assumes that a quicklookup match is the same file, no hash checking involved. since all its doing is either skipping or setting a ref, hash should already be in the tags. the user purposely risks a collision when enabling this feature. speed vs accuracy.

if the user becomes ultra paranoid that quicklookup is missing songs, he has the option to do a full hash compare (i.e. sync in previous post above). in this case, jemplode should notify user if quicklookups are the same but the full hash isn't. if this situation ever happened, i would imagine that you can "blacklist" the quicklookup, so matching songs will revert back to calculating the hash for the one track.

so, 3 cases when using quick lookup.
1) quicklookup has no hits, hash is calculated, track uploaded
2) quicklookup has a hit, ref/skip the track, hash not calculated
3) quicklookup has a hit on thats blacklisted, hash calculated and looked up for hits (regular algorithm).

Top
#133770 - 17/01/2003 06:27 Re: more jemplode ideas [Re: genixia]
mschrag
pooh-bah

Registered: 09/09/2000
Posts: 2303
Loc: Richmond, VA
I have considered computing at sync time, actually, since you have to read the file in to sync it -- I don't know if the order works out (I think tags are synced first, but I need to check).

As far as md5sum speed -- I don't think Java's going to get anywhere near that 47 in 30s, but I'll test it out to see ... incidentally, I used to use MD5 but switched to a CRC32.

ms

Top
#133771 - 17/01/2003 06:30 Re: more jemplode ideas [Re: image]
mschrag
pooh-bah

Registered: 09/09/2000
Posts: 2303
Loc: Richmond, VA
Ah -- I didn't realize you were using the quick match as a "reliable match". Clearly that changes things. I need to do some tests and see what the false negative % is.

Top
#133772 - 17/01/2003 07:11 Re: more jemplode ideas [Re: genixia]
mschrag
pooh-bah

Registered: 09/09/2000
Posts: 2303
Loc: Richmond, VA
I thought about this a little more and did some tests ... if you did 50 mp3's in 30s, take your average MP3 collection of at least 4000 songs (given the numbers from the people on this board, it's probably a lot higher average) and you're talking like 40 minutes of just computing hash codes. That's EVERY import if it stays the current way (= compute hash code on file to see if there is a duplicate).

I ran a CRC32 on my 4000 tunes and only added 4 minutes to the length of import. Not quite as offensive.

Incidentally, to just enumerate every file in my collection and display the name (with the overhead of scrolling the output, which actually does add time, surprisingly), it takes 13 seconds.

ms

Top
#133773 - 17/01/2003 08:19 Re: more jemplode ideas [Re: mschrag]
image
old hand

Registered: 28/04/2002
Posts: 770
Loc: Los Angeles, CA
here's my speedtest.

importing using jemplode took 25 minutes on 30.3gigs consisting of 3041 tracks.

heres an idea similar to pipelining. since most of the time is from reading the mp3 (which is evident when i import a small folder twice in a row to different playlists), not actually calculating the hash, does java have the facilites to buffer, say, the first 200megs of tracks in memory before starting to do CRC32? maybe two threads, where the read-ahead will always have more cpu priority and stay ahead?

EDIT: actually, i after doing more thinking, i dont think that this is feasable. no way will reading ahead even come remotely close to staying ahead of the crc check. once the files are in memory, crc32 seems instantaneous.


Edited by iMaGe (17/01/2003 08:25)

Top
#133774 - 17/01/2003 09:15 Re: more jemplode ideas [Re: image]
mschrag
pooh-bah

Registered: 09/09/2000
Posts: 2303
Loc: Richmond, VA
incidentally, importing the same folder twice is still computing the hash twice ...

ms

Top
#133775 - 20/01/2003 17:29 Re: more jemplode ideas [Re: mschrag]
image
old hand

Registered: 28/04/2002
Posts: 770
Loc: Los Angeles, CA
have you considered using adler32 to replace crc32? i was researching hashes, and it seems to be as much as 20x faster than crc32, and java natively supports it, if i read right.

EDIT: i saw that you've benchmarked it already using the jsdk, but have you tried port a more optimized version?


Edited by iMaGe (20/01/2003 17:31)

Top
#133776 - 21/01/2003 06:19 Re: more jemplode ideas [Re: image]
mschrag
pooh-bah

Registered: 09/09/2000
Posts: 2303
Loc: Richmond, VA
I had not tried a non-sun impl of Adler32, however as a result of discussions on the alpha mailing list, you don't need to send any more ideas .... I can't say any more or Rob will give the OK to those men in black that keep following me around.

Top
#133777 - 21/01/2003 08:01 Re: more jemplode ideas [Re: mschrag]
Yang
addict

Registered: 14/01/2002
Posts: 443
Loc: Raleigh, NC
IE, problem solved, no need to worry..

Top
#133778 - 21/01/2003 09:09 Re: more jemplode ideas [Re: mschrag]
image
old hand

Registered: 28/04/2002
Posts: 770
Loc: Los Angeles, CA
interpretation: jemplode feature set (present and proposed) merged with emplode.

hopefully.

Top
#133779 - 21/01/2003 11:38 Re: more jemplode ideas [Re: image]
genixia
Carpal Tunnel

Registered: 08/02/2002
Posts: 3411
Has anyone else noticed that Mike's sig has changed?

Didn't it used to say something along the lines of;

"Make me buy a Rio Central - Implement Central <-> Empeg synching"

???
_________________________
Mk2a 60GB Blue. Serial 030102962 sig.mp3: File Format not Valid.

Top
#133780 - 21/01/2003 13:18 Re: more jemplode ideas [Re: genixia]
smu
old hand

Registered: 30/07/2000
Posts: 879
Loc: Germany (Ruhrgebiet)
I think you are confusing two users here: msaeger and mschrag

msaeger still has the line you are talking of, mschrag never had it.

cu,
sven
_________________________
proud owner of MkII 40GB & MkIIa 60GB both lit by God and HiJacked by Lord

Top
#133781 - 21/01/2003 14:06 Re: more jemplode ideas [Re: smu]
genixia
Carpal Tunnel

Registered: 08/02/2002
Posts: 3411
Ah.... you're right. Thanks.

I guess that predicting Central - Empeg synch in 2.0rc1 would be a complete shot in the dark then.
_________________________
Mk2a 60GB Blue. Serial 030102962 sig.mp3: File Format not Valid.

Top
#133782 - 21/01/2003 15:57 Re: more jemplode ideas [Re: genixia]
msaeger
carpal tunnel

Registered: 23/09/2000
Posts: 3608
Loc: Minnetonka, MN
That's mine

Maybe I need a bigger font it's not working
_________________________

Matt

Top