Welcome to the Invelos forums. Please read the forum rules before posting.

Read access to our public forums is open to everyone. To post messages, a free registration is required.

If you have an Invelos account, sign in to post.

    Invelos Forums->DVD Profiler: Contribution Discussion Page: 1 2  Previous   Next
Parsing
Author Message
DVD Profiler Unlimited RegistrantStar Contributorsurfeur51
Since July 3, 2003
Registered: March 29, 2007
Reputation: Great Rating
France Posts: 4,479
Posted:
PM this userView this user's DVD collectionDirect link to this postReply with quote
Quoting m.cellophane:
Quote:

If the program could ignore parsing differences, I think we'd be on a good path to eliminating many of the issues.


Unfortunately, linking problems do not come only from parsing. As I already wrote in another thread, linking problems are due to different factors. Seeing my collection and problems I had to solve, the reasons are :

1/  different credits for the same actor, with a contributor not using Common name from CLT
2/  bad transcription of capitalized letters, omitting accents
3/  typos by contributor when copying credits
4/  asian names that are sometimes in asian order, sometimes in western order (Gong Li/Li Gong, Zhang Ziyi/Ziyi Zhang)
5/  different actors with same name, without birth year

Those reasons are more than 80% of linking problems. Ignoring parsing will solve nothing for those.

6/  incorrect evident parsing by contributors ignoring rules on titles and articles...  About 15% of linking problems, that could probably be solved by automatic filters
7/ difficult parsing : about 5%

I'm not against a ignoring parsing solution. But in fact it will solve a very little percentage of linking problems.
Images from movies
DVD Profiler Desktop and Mobile RegistrantStar Contributorhal9g
Who is John Galt?
Registered: March 13, 2007
Reputation: High Rating
United States Posts: 6,635
Posted:
PM this userEmail this userView this user's DVD collectionDirect link to this postReply with quote
Quoting surfeur51:
Quote:
I'm not against a ignoring parsing solution. But in fact it will solve a very little percentage of linking problems.


More importantly, it would eliminate these endless, and useless, pages and pages of debates on parsing!

.......or maybe not! 
Hal
DVD Profiler Desktop and Mobile RegistrantStar ContributorVoltaire53
Missed again!
Registered: March 13, 2007
Reputation: High Rating
United Kingdom Posts: 2,293
Posted:
PM this userEmail this userVisit this user's homepageView this user's DVD collectionDirect link to this postReply with quote
Quoting lyonsden5:
Quote:
The only thing that would (IMO) is an amendment to the rules to give direction.


True, but I thought we had that at one point, or at least a widely followed agreement (though vociferously objected to my some of course!), that we put first name in first field, last name in last field and anything else in the middle field and we totally ignored whether the person in question thought of (say) their last two names as a surnmae or not.

However, the idea that the program could automatically ignore parsing for linking purposes would solve a huge portion of the problem and i think it's an excellent idea.
It is dangerous to be right in matters where established men are wrong
DVD Profiler Unlimited RegistrantStar ContributorSpaceFreakMicha
Jesus-Freak
Registered: March 13, 2007
Reputation: High Rating
Germany Posts: 1,774
Posted:
PM this userView this user's DVD collectionDirect link to this postReply with quote
Quoting hal9g:
Quote:
More importantly, it would eliminate these endless, and useless, pages and pages of debates on parsing!

.......or maybe not! 


Don't worry, we have a lot of other problems that could be "discussed" ( = could be used for flame wars):
Asian names and Japanese romanization, CLT results and how to interpret them, copy & paste of cast/crew from a different profile, aspect ratio (actual vs. rounded); audio tracks that are available but not selectable via menu... 


Ok, ontopic:
I don't understand why we need a "standard" for parsing. If you start with 1/2/3 or 1/2 3 and then document a change to the opposite doesn't make a difference. Unless any data is somehow verified it could be wrong. So there is no use of saying 1/2/3 has to be a standard or 1/2 3 has to be a standard, because both could be wrong without further verification.

The database won't be any more correct by using 1/2/3 as a start, as it would be by using 1/2 3 as a start.

IMHO this whole discussion is much ado about nothing. So why lose so many time for endless debates, this time could be used to actual verifiy or change the parsing of a few actors, where it is neccessary. 
 Last edited: by SpaceFreakMicha
DVD Profiler Unlimited RegistrantStar ContributorRHo
Registered: March 13, 2007
Posts: 2,759
Posted:
PM this userDirect link to this postReply with quote
Quoting Voltaire53:
Quote:

True, but I thought we had that at one point, or at least a widely followed agreement (though vociferously objected to my some of course!), that we put first name in first field, last name in last field and anything else in the middle field and we totally ignored whether the person in question thought of (say) their last two names as a surnmae or not.

No, this word counting has been developed in an external forum of an unofficial self-appointed rules committee and never made it to the official Invelos rules nor has it ever been a general consensus in this forum.
DVD Profiler Unlimited RegistrantStar Contributorsugarjoe
Registered: March 15, 2007
Germany Posts: 374
Posted:
PM this userDirect link to this postReply with quote
If there was another, intelligent mechanism to link cast & crew (and some good ideas have been brought forward here in the forum) then parsing and the way you write a persons name doesn't matter.

I think this would be a big improvement. And the end of a lot of (partly unfruitful) discussions.
DVD Profiler Unlimited RegistrantStar ContributorWinston Smith
Don't be discommodious
Registered: March 13, 2007
United States Posts: 21,610
Posted:
PM this userEmail this userView this user's DVD collectionDirect link to this postReply with quote
Quoting Voltaire53:
Quote:
Quoting lyonsden5:
Quote:
The only thing that would (IMO) is an amendment to the rules to give direction.


True, but I thought we had that at one point, or at least a widely followed agreement (though vociferously objected to my some of course!), that we put first name in first field, last name in last field and anything else in the middle field and we totally ignored whether the person in question thought of (say) their last two names as a surnmae or not.

However, the idea that the program could automatically ignore parsing for linking purposes would solve a huge portion of the problem and i think it's an excellent idea.

Quite right, Voltaire, I think that most of us have been following this. But unfortunately most isn't good enough, all it takes is ONE user (Rho) who decides he doesn't want to do it , but wants to do it his way instead to make a complete hash out of everything.

Skip
ASSUME NOTHING!!!!!!
CBE, MBE, MoA and proud of it.
Outta here

Billy Video
DVD Profiler Unlimited RegistrantStar ContributorSpaceFreakMicha
Jesus-Freak
Registered: March 13, 2007
Reputation: High Rating
Germany Posts: 1,774
Posted:
PM this userView this user's DVD collectionDirect link to this postReply with quote
Quoting Dr Pavlov:
Quote:
Quoting Voltaire53:
Quote:
Quoting lyonsden5:
Quote:
The only thing that would (IMO) is an amendment to the rules to give direction.


True, but I thought we had that at one point, or at least a widely followed agreement (though vociferously objected to my some of course!), that we put first name in first field, last name in last field and anything else in the middle field and we totally ignored whether the person in question thought of (say) their last two names as a surnmae or not.

However, the idea that the program could automatically ignore parsing for linking purposes would solve a huge portion of the problem and i think it's an excellent idea.

Quite right, Voltaire, I think that most of us have been following this. But unfortunately most isn't good enough, all it takes is ONE user (Rho) who decides he doesn't want to do it , but wants to do it his way instead to make a complete hash out of everything.

Skip


Could you please provide us a link for this "agreement"?
 Last edited: by SpaceFreakMicha
DVD Profiler Desktop and Mobile RegistrantStar ContributorTheMadMartian
Alien with an attitude
Registered: March 13, 2007
Reputation: Highest Rating
United States Posts: 13,201
Posted:
PM this userEmail this userView this user's DVD collectionDirect link to this postReply with quote
Quoting SpaceFreakMicha:
Quote:
Ok, ontopic:
I don't understand why we need a "standard" for parsing. If you start with 1/2/3 or 1/2 3 and then document a change to the opposite doesn't make a difference. Unless any data is somehow verified it could be wrong. So there is no use of saying 1/2/3 has to be a standard or 1/2 3 has to be a standard, because both could be wrong without further verification.

The database won't be any more correct by using 1/2/3 as a start, as it would be by using 1/2 3 as a start.

IMHO this whole discussion is much ado about nothing. So why lose so many time for endless debates, this time could be used to actual verifiy or change the parsing of a few actors, where it is neccessary. 

The reason it is an issue is linking.  Let me try and explain using Robin Wright Penn.

Let's say you enter her in a profile, as Robin/ /Wright Penn and Skip enters her in a different profile as Robin/Wright/Penn.  When I download those two profiles, I will have two different actor entries.  If I double click on one, the profile...or profiles...with the other will not come up.  For them to link, I would have to change one of the names.  For a lot of people, this is unacceptable.
No dictator, no invader can hold an imprisoned population by force of arms forever.
There is no greater power in the universe than the need for freedom.
Against this power, governments and tyrants and armies cannot stand.
The Centauri learned this lesson once.
We will teach it to them again.
Though it take a thousand years, we will be free.
- Citizen G'Kar
DVD Profiler Unlimited RegistrantStar ContributorSpaceFreakMicha
Jesus-Freak
Registered: March 13, 2007
Reputation: High Rating
Germany Posts: 1,774
Posted:
PM this userView this user's DVD collectionDirect link to this postReply with quote
Quoting TheMadMartian:
Quote:
The reason it is an issue is linking.  Let me try and explain using Robin Wright Penn.

Let's say you enter her in a profile, as Robin/ /Wright Penn and Skip enters her in a different profile as Robin/Wright/Penn.  When I download those two profiles, I will have two different actor entries.  If I double click on one, the profile...or profiles...with the other will not come up.  For them to link, I would have to change one of the names.  For a lot of people, this is unacceptable.


I see the problem, but I can't see why 1/2/3 is any better as a starting point as 1/2 3.
DVD Profiler Unlimited RegistrantStar ContributorVirusPil
uncredited
Registered: January 1, 2009
Reputation: Highest Rating
Germany Posts: 3,087
Posted:
PM this userDirect link to this postReply with quote
Quoting eaglejd:
Quote:
....
How is this parsed?

List/ of/ Accepted/ Parsed/ Names/ with/ Documentation?

List of Accepted Parsed//Names with Documentation?

List of Accepted/ Parsed Names/ with Documentation?



Serious? Of course it's a stage name so it is

List of Accepted Parsed Names with Documentation//

DVD Profiler Desktop and Mobile RegistrantStar ContributorTheMadMartian
Alien with an attitude
Registered: March 13, 2007
Reputation: Highest Rating
United States Posts: 13,201
Posted:
PM this userEmail this userView this user's DVD collectionDirect link to this postReply with quote
Quoting SpaceFreakMicha:
Quote:
I see the problem, but I can't see why 1/2/3 is any better as a starting point as 1/2 3.

I don't know that it is any better, neutral, but not better.  As I said in another thread, it really doesn't matter to me as I don't care about linking, but I would like it to work for those that do and a set starting point is the only solution that I can think of.
No dictator, no invader can hold an imprisoned population by force of arms forever.
There is no greater power in the universe than the need for freedom.
Against this power, governments and tyrants and armies cannot stand.
The Centauri learned this lesson once.
We will teach it to them again.
Though it take a thousand years, we will be free.
- Citizen G'Kar
DVD Profiler Unlimited RegistrantStar ContributorWinston Smith
Don't be discommodious
Registered: March 13, 2007
United States Posts: 21,610
Posted:
PM this userEmail this userView this user's DVD collectionDirect link to this postReply with quote
Quoting SpaceFreakMicha:
Quote:
Quoting TheMadMartian:
Quote:
The reason it is an issue is linking.  Let me try and explain using Robin Wright Penn.

Let's say you enter her in a profile, as Robin/ /Wright Penn and Skip enters her in a different profile as Robin/Wright/Penn.  When I download those two profiles, I will have two different actor entries.  If I double click on one, the profile...or profiles...with the other will not come up.  For them to link, I would have to change one of the names.  For a lot of people, this is unacceptable.


I see the problem, but I can't see why 1/2/3 is any better as a starting point as 1/2 3.

I have explained this many time, Space.

Skip
ASSUME NOTHING!!!!!!
CBE, MBE, MoA and proud of it.
Outta here

Billy Video
DVD Profiler Unlimited Registrantgardibolt
digitally Obsessed
Registered: March 13, 2007
Posts: 1,414
Posted:
PM this userVisit this user's homepageView this user's DVD collectionDirect link to this postReply with quote
The linking should ignore capitalization too, just like the contribution system and the CLT do.
"This movie has warped my fragile little mind."
DVD Profiler Desktop and Mobile RegistrantStar Contributorhal9g
Who is John Galt?
Registered: March 13, 2007
Reputation: High Rating
United States Posts: 6,635
Posted:
PM this userEmail this userView this user's DVD collectionDirect link to this postReply with quote
Quoting gardibolt:
Quote:
The linking should ignore capitalization too, just like the contribution system and the CLT do.


Good point, except locally, I think you can only have one version of capitalization, and linking happens in your local db.
Hal
 Last edited: by hal9g
DVD Profiler Desktop and Mobile RegistrantStar ContributorTheMadMartian
Alien with an attitude
Registered: March 13, 2007
Reputation: Highest Rating
United States Posts: 13,201
Posted:
PM this userEmail this userView this user's DVD collectionDirect link to this postReply with quote
Quoting gardibolt:
Quote:
The linking should ignore capitalization too, just like the contribution system and the CLT do.

It already does, kinda.  You can't have both Danny DeVito and Danny Devito in your local db.  If you have DeVito, and download a profile with Devito, the local program will use DeVito.  Parsing, in my opinion, should be done exactly the same way...though I don't know how difficult that would be to program.
No dictator, no invader can hold an imprisoned population by force of arms forever.
There is no greater power in the universe than the need for freedom.
Against this power, governments and tyrants and armies cannot stand.
The Centauri learned this lesson once.
We will teach it to them again.
Though it take a thousand years, we will be free.
- Citizen G'Kar
    Invelos Forums->DVD Profiler: Contribution Discussion Page: 1 2  Previous   Next