且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何从产品名称中提取品牌

更新时间:2023-11-30 12:38:04

您尝试获取的信息实际上并不存在.

The information you're trying to get isn't actually there.

如果您使用两个字符串,两个字符串都可以有任意数量的空格,并将它们与一个空格连接在一起,则不再可能明确地告诉哪个空格将两个字符串连接在一起,以及哪个空格是字符串的一部分

If you take two strings, both of which may have any number of spaces, and join them together with a space, it's no longer possible to tell unambiguously which space was joining the two strings, and which spaces were part of the strings.

因此,您有几种选择:

首先,每个产品中没有 个空格,因此您可以尝试所有可能性:品牌Brave和产品Soul Men's Swansea Jeans - Denim,然后是品牌Brave Soul和产品Men's Swansea Jeans - Denim,然后是品牌Brave Soul Men's和产品Swansea Jeans - Denim,以此类推,以获取其他3种可能性.

First, there aren't that many spaces in each product, so you can just try all the possibilities: Brand Brave and Product Soul Men's Swansea Jeans - Denim, then Brand Brave Soul and Product Men's Swansea Jeans - Denim, then Brand Brave Soul Men's and Product Swansea Jeans - Denim, and so on for the other 3 possibilities.

第二,如果您可以从其他位置抓取所有品牌名称的列表并将其存储在set(或数据库表等)中,则可以预先过滤可能性,然后在相对较慢的网络中尝试所有可能性向亚马逊提出的要求.例如,如果您有所有品牌的列表,只需检查BraveBrave SoulBrave Soul Men'sBrave Soul Men's Swansea等中的哪一个是实际品牌,然后仅进行测试即可.

Second, if you can scrape a list of all brand names from somewhere else and stash them in a set (or a database table or whatever), you can pre-filter the possibilities before trying them all in comparatively slow web requests to Amazon. For example, if you have a list of all the brands, just check which among Brave, Brave Soul, Brave Soul Men's, Brave Soul Men's Swansea, etc. are actual brands, and only test those.

与此同时,这仍然不是完美的,因为几乎可以肯定情况是模棱两可的.例如,有一个品牌Apple和一个品牌Apple Records,那么当您尝试拆分Apple Records Master Collection时会发生什么呢?您有两种有效的可能性,而不仅仅是一种.您所能做的就是设计代码以某种方式处理它(并进行正确的单元测试).

Meanwhile, this still isn't going to be perfect, because there are almost certainly cases that are ambiguous. For example, there's a brand Apple, and also a brand Apple Records, so what happens when you try to split up Apple Records Master Collection? You've got two valid possibilities, not just one. All you can do is design your code to deal with that in some way (and unit test that you did so correctly).